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Thomas P. Cheatham, Jr. 


Thomas P. Cheatham, Jr. (A’52—SM’54) was 
born in Washington, D.C., on January 30, 1923. He 
received the B.S. degree from the United States 
Coast Guard Academy in 1948, and the M.S. and 
Se.D. degrees from the Massachusetts Institute of 
Technology in 1947 and 1952, respectively. 

From 1943 to 1946, Dr. Cheatham served as 
both line and engineering officer in the U. 8. Coast 
Guard, and first became interested in the field of 
electronics through his close association with the 
Coast Guard’s operational and development program 
in radar, radar beacons, and Loran. 

From 1946 to 1949, he was a research associate 
at the M.I.T. Research Laboratory of Electronics, 
where he did work on impulse noise in FM receivers 
and developed an early interest in statistical com- 
munication theory. 

In 1949, Dr. Cheatham visited Norway as a 
consultant to the Norwegian Defense Department, 
working particularly at Bergen in the formation of 
a new radar division and assisting in the develop- 
ment of a microwave communication relay system 
connecting Bergen with Oslo. He returned to the 
United States in 1950 to complete his graduate 
studies and to join the Physical Research Labora- 
tories of Boston University as electronic section 
head and instructor in the Physics Department. 
During that time, he was principally concerned with 
research and development on single line scan 
television systems and in the synthesis of optical 


and electro-optical devices. In 1952, Dr. Cheatham 
accepted an appointment as Research Fellow at 
Harvard University where he carried out research 
in the field of random processes and_ statistical 
communication theory. During the period from 1949 
to 1953, he did extensive consultive work in industry 
and for the government in the field of statistical 
communications, and particularly on optimal tech- 
niques for data processing, criteria for design of 
prediction computers, network analysis and 
synthesis. 

In 1953, he joined industry on a full-time basis, 
becoming Director of Research for Melpar, Inc., 
Boston, Mass. He has directed the growth of this 
department from its initial formation as a research 
group to its present size of two laboratories en- 
compassing the broad fields of electronics, physics, 
and data processing. 

Dr. Cheatham is a member of Sigma Xi, and the 
author of several papers in the field of information 
theory. He has long been active in both local and 
national IRE activities. He was elected Chairman 
of the Boston Section in 1955 and has held various 
other offices both before and since then. He has been 
a member of the Information Theory and Modu- 
lation System Committee since 1952 and a member 
of the Administrative Committee since 1956. He was 
appointed Business Manager in 1957, and elected 
Chairman of the Professional Group on Information 
Theory in 1958. 
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A Broader Base for the PGIT 


THOMAS P. CHEATHAM, JR. 


It is recognized that the PGIT has achieved at 
least two of its essential objectives, those of technical 
stature and a unique identity within its professional 
society. The evolution of our editorial policies and 
the high standards that have been achieved for 
publication of our TRANSACTIONS when combined 
with the several outstanding symposia that have 
been sponsored by the PGIT give us a great deal 
of which to be proud. 

However, the achievement of these objectives 
has been the work of a relatively few individuals 
whom we might refer to as the “hard core’”’ of our 
organization. The ability to sustain the present 
position requires the establishment of an organi- 
zational structure that enlarges the amount of 
active participation from the PGIT membership 
and which, at the same time, is re-oriented to be 
compatible with long-range plans and objectives. 
We feel that our primary purpose is associated with 
the word “education”? and more specifically, the 
education of the PGIT membership in tune with 
current interests and trends. Although the PGIT 
needs reasonably precise and unambiguous con- 
straints to guide it, these should be dynamic in 
nature rather than static. For example, I feel that 
the name of our Professional Group on Information 
Theory is too restricted, and might well be supple- 
mented by an appropriate subtitle to augment and 
explain the more general and present scope of 
interest of the whole membership. Such a subtitle 
might be: “The Transmission and Processing of 
Information.”’ Thus, the systems engineers in various 
fields and the computer designer, who apply the 
basic concepts of information theory to their 
problems as guide lines in design and as a means of 
establishing criteria for efficiency and reliability, 
should have a more obvious position and interest in 


our group. It is important to recognize the value of 
this industrial interest in systems and their exami- 
nation for compatibility with the basic tenets of 
information theory as a means for providing a 
logical servo link by which new problems are returned 
to the basic and applied theoretist. 

I would like also to suggest that there is fair 
justification for recognizing a need for two types of 
symposia. One is slanted towards the basic research 
work in the field and has as its principal audience, 
the serious full-time basic research members of the 
group. The model for this type of symposia is the 
M.1.T. symposia of past years that have been held 
biannually. It is strongly felt that the technical 
character of this meeting should not change other 
than to perhaps grow with increasing emphasis on 
its national and international character. It is 
suggested that interleaved with this type of symposia 
and operating on alternate years with it, there should 
be a broader based type of symposium which empha- 
sizes system applications and fringe areas where we 
overlap with other professional interest and disci- 
plines. For example, plans are currently in progress 
for holding a joint Information Theory-Circuit The- 
ory symposium in Los Angeles in June of next year. 

To achieve a broader base and a more flexible 
attitude toward current interest and trends of our 
membership, it will be necessary to increase the 
number of working committees and subcommittees 
so that new interests, new concepts, and new blood 
can be heard. Special issues of the TRANSACTIONS 
will be sparked and guided by our newly formed 
Editorial Board. My hope is that we as a professional 
group will explore with some spirit of adventure, 
that we will be organized efficiently, but with 
sufficient flexibility for enthusiasm and creative 
thinking to permeate our efforts. 
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A Statement of 


The IRE TRANSACTIONS ON INFORMATION THEORY 
is a quarterly journal devoted to the publication of 
papers on the transmission, processing, and _ utili- 
zation of information. The exact subject matter of 
acceptable papers is intentionally, by editorial 
policy, not sharply delimited. Rather, it is hoped 
that as the focus of research activity changes, a 
flexible policy will permit the TRANSACTIONS to 
follow suit and that it will continue to serve its 
readers with timely articles on the fundamental 
nature of the communication process. Topics of 
current appropriateness include extensions of the 
information theories of Shannon and Wiener and 
their ramifications, analyses and design of com- 
munication systems, information sources, pattern 
recognition, receiving and detection, automata and 
learning, large-scale information processing systems, 
and so forth. 

Papers can be of two kinds: tutorial or research, 
and should be so indicated. The former must be 
well-written expositions summarizing the state of a 
field in which research is still in progress, or else 
bring together as a unity results scattered in the 
literature. Research papers must be original contri- 
butions not published elsewhere. They must pre- 
sent new methods, concepts or ideas, or extend old 


ones to new areas of applicability; or, they must 


INFORMATION THEORY 


Editorial Policy 


present new data, findings or inventions, or solve 
new problems of more than casual interest. They 
will not be accepted if, in the view of the reviewers 
and editors, they constitute a straightforward and 
easy application of existing theory to a special case 
of limited interest. It is not necessary that the length 
of each research paper be great; on the contrary, 
the submission of short but formal research notes 
is to be encouraged. These will not be published as 
correspondence, but will be subject to the same 
review standards as longer papers. 

In addition to papers, readers are invited to sub- 
mit notes to the Correspondence section. These may 
include early summaries of important work to be 
published later at greater length, remarks on ma- 
terial that has already appeared, and so forth. 
Reasonable contributions to this section will be 
published without editorial review. 

All manuscripts should be prepared with clarity 
of style and economy of mathematical notation 
always in mind. Unusual symbols are to be avoided 
and display formulas are to be kept to a minimum 
consistent with clear exposition. Related work 
should be adequately referenced and references to 
readily available literature should be made in place 
of repeating existing derivations and arguments. 


—The Administrative Committee 


December 
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Summary—The measurements made on a system containing 
oise are usually time averages of the signals, or of quantities 
efined in terms of the signals. Such measurements are called 
ifime statistics. The object of this paper is to develop the theory 
time statistics and in turn to give methods for calculating them. 
for the most part the time statistics are formulated in terms of 
ensemble statistics which are usually provided by statistical me- 
chanics. 

If a process consists of, say, all physically realizable models of 

system containing noisy resistors, there is no practical way to 

dentify which model one has available for ‘‘testing.’’ Thus, a time 
Statistic measured with the available model will not be predictable 
winless this statistic is the same for almost all the models; when 
this is the case, the process is called uniform! for this statistic. A 
idual property is in common use for ensemble statistics. The pro- 
bess is called stationary for an ensemble statistic, provided it is 
the same at all times. Though some discussion of stationarity is 
iven in this paper, the emphasis is on not requiring stationarity. 
nm particular, special attention is given to nonstationarity intro- 
duced by determinate signals. While stationarity plays only a minor 
role in the theory of the time statistics of noise, uniformity plays a 
crucial role. Given only uniformity, Theorem 1 formulates time 
statistics as the time average of the corresponding ensemble sta- 
jtistics. The additional condition of stationarity merely simplifies the 
alculation by rendering the “ergodic hypothesis” satisfied, i.e., 
iby rendering equality of time and ensemble statistics. 
With Theorem 1 as a nucleus, the remainder of the paper at- 
tempts to develop an understanding of what makes a process 
uniform. There is no attempt to give detailed proofs, but there is 
an effort to maintain a clear distinction between physical moti- 
vations, the definitions, and the theorems. Some elementary 
sample calculations of practical interest are included; these serve 
to illustrate several parts of the theory. Though calculations in- 
wolving such problems as the evaluation of difficult integrals do 
arise in some applications of the theory, simple samples have been 
used here, since they are adequate as an aid to understanding the 
theory. 


I. InTRODUCTION 


ACKGROUND material on noise theory is found 
B in the Bibliography. Lucid accounts of the history 

of ergodic theory are given [6], [14]; Loeve [10] 
riefly brings the history up to date. Practical insight for 
the work is provided in a very helpful manner [1], [2], 
8], [9], [11], as is the mathematical background [1], [3], 
' [7], [10], {12}. 

Motivation for this paper results from the inadequate 
descriptions in the existing literature of the ties between 
probabilistic notions associated with noise ensembles and 
time-average measurements. In particular, it is intended 
that this paper fill the gap between brief discussions on 
ergodicity of the type found in the engineering literature 


* Manuscript received by the PGIT, April 25, 1958. This paper 
is based on the author’s Doctoral dissertation. Some of the work 
was supported by USAF Contract No. 33(616)-3374 at the Radia- 
tion Lab., The Johns Hopkins University, Baltimore, Md. 

+ Elec. Eng. Dept., University of Mich., Ann Arbor, Mich. 

1 The term “ergodic” is reserved for a condition that makes a 
process uniform for a large class of statistics. Also, it should be 
mentioned that ‘almost all’’ means for all models, except possibly 
1, set of measure zero, and that terms in italic are being defined. 
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Time Statistics of Noise® 


WILLIAM M. BROWN{ 


(cf. [8], p. 114) and mathematical details such as those 
given by Doob [4] and Loeve [10]. Section IT of this paper 
gives a fresh organization to the theory underlying time 
statistics. Though the technique in the conclusion of Theo- 
rem | has been used by other workers, the theorem has not 
been publicized nor has it been used elsewhere to organize 
the theory. 

While some of the material presented here is new, 
the exact status of the various parts is not described; 
the general organization and Section II-D are probably 
the major contributions. 

The following characterize noise analysis. A collection 
of physical things (models), rather than only one thing, 
must be analyzed, and average values are the quantities 
of chief concern. The collection is a physical necessity 
when one wishes to analyze a system for which certain 
factors are not completely specified. It is common not to 
specify certain details about the microscopic structure of 
elements, such as noisy resistors. If only macroscopic 
properties, such as resistance value, are fixed, different 
realizations give rise to different noise waveforms. 
Analysis, which is to be applicable to any realization, is 
limited in that usually only average values are pre- 
dictable. 

Stated briefly, a noise process is a (finite) set of real- 
valued functions of two variables. Each function is called 
a noise ensemble. The first variable (typical point denoted 
by alpha) depicts the member of the ensemble (also 
called model or realization here); its domain is assumed 
to be a probability measure space I. The second variable 
is usually over the real line” (time). 

Fig. 1 illustrates the models of a noise process. For 
example, each model could be a particular realization of a 
certain type of radar and the ensemble could consist of 
all possible realizations. The functions would be various 
electrical quantities. The f-noise ensemble, which is com- 
posed of f.(¢), f-(t), fa(t), --- would be formed by taking 
these time functions from corresponding places, such as the 
input voltages of the a radar, o radar, B radar, --- , 
respectively; the g ensemble, which is composed of gg, 
Gihaes might be taken as the currents from some 
particular wire in the respective radars, etc. It may be 
that almost all functions of time in, say, the f ensemble 
are the same, in which case the f ensemble is called 
determinate. Typically, the intended signals in a system 
are determinate, save possibly for absolute time reference, 


2 The domain for the second variable might be only the integers, 
in which case the process is called discrete. Actually, it is easy to 
adjust this paper to cover any set for the second variable, provided 
there is a notion of average (denoted by A later) and a notion of 
translation associated with this variable. The author takes the*real 
line only to be specific. 
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but this is accounted for in the section on determinate 
signals (II-D). 

If F isa real valued function of a, the ensemble average 
(expected value) of F is denoted by E(F) or by 
f,F(a@)du(a). A lengthy discussion of this integral is not 
included, but the following basic properties are mentioned. 
FE isa linear operator; if F(a) = 1foreachain J, E(F) = 1; 
and if F(a) > 0 for each a in J and H(F) exists, H(F) = 0. 
Also, if B is a set of points in J and ¢,(a) is taken as one 
when a@ is in B, and as zero when a is not in B, then B 
is measurable if, and only if, H(¢,) exists and H'(¢z) is the 
measure of B. Finally, if the set of e’s for which F(a) > C 
is measurable for each real C, then F is called a measure- 
able function of a. 


MODEL o MODEL a 


MODEL B 


Fig. 1—Ensemble of models. 


Similarly, if F is a function of time, the time average of F 
is denoted by A(F) and defined by 


a 0 
DACP m= [ F@ at + lim 5 | F(t) dt. 
a Jo ee Uren 


a7 
(‘‘=” means defined as.) 


Taking either of these terms alone also provides a useful 
definition of the time average of F, and using limits in 
the mean (with respect to @ integration, where F is 
assumed to be a function of a as well as ¢) also provides 
useful definitions of time average. We take the indicated 
definition in order to be specific. Observe that if A(F) 
exists, 

: 1 : 

Micra] Oe 
exists; this is sometimes used for the definition of A. 
However, the definition used here has the advantage that 
the existence of A(/’) is all that is needed to prove that 
A(F) is independent of phase, 7.e., A[F(‘)] = A[F(t + c)] 
for all c. 

Now let F be a function of a and t. The ensemble statistic 
for F is defined as H(F). The process is stationary for F 
if H(F) is the same for all t. The teme statistic for F is 
defined as A(/’) and the process is unzform for F if A(F) 
is the same for almost all a. ‘Almost all a’ will sometimes 
be abbreviated by a.e. (almost everywhere). Before 
proceeding with the theory, some examples of typical 
relations between F and the noise ensemble will be given. 

The nth cumulative distribution of f. If Jy. , +++ 5 Yn) = 1 
when every y, = 0 and J = O otherwise, then the nth 
ensemble distribution of f evaluated at (a, , 2, °°+ , Un} 


+t +1,--*,4,+ 9 is given by EJ i[7, =f Gasae 
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- an — fa (ty + #)]. If H is replaced by A, one has the 
nth time distribution of f. Of course, if the x’s and ?’s are 
fixed, J above can be considered a function / of a and t. 

A joint characteristic function of f and g. Here one may 
take 


F(a, t) = exp E S f Abe tet) Grate > Gate; + on, |. 


Then the joint ensemble characteristic function of f and 
g, evaluated at the &, q., u;, and 7; values indicated, is 
given by E(F), and A(F) gives the corresponding time 
statistic. 

If P represents all the parameters, such as the g and r 
variables in the last example, 


fla, t, n) Uf a(t te t), fa(te = t), aa hate oh t)], 
gla, t, m) = [Ga(ur ote i); en XS Ja(Um oF )], 


etc., then the general (joint) statistic for the ensembles 
f,g, --: , his formulated as follows. Let 


Fla, t) 7 H(fla, t, n), gla, t, m) , ilae , ha, t, D) tae 


The ensemble statistic for this F is H(/) and the time 
statistic is A(/). Here H denotes a function of many real 
variables; the values of H (and hence of F) are either real 
or in a cartesian space having a dimensionality greater 
than one. In the latter case, H can be considered an 
ordered n-tuple of real-valued functions and the average 
(£ or A) of H is taken as the ordered n-tuple of averages. 
One might take H to be a more general mapping than 
that provided by a function of many real variables, but 
again it seems better to be specific than to strive for more 
generality. 


I 


Il. FUNDAMENTAL THEOREMS 


The program is first to give Theorem 1 which, though 
easy to prove, is important, comprehensive, and useful. 
The remainder of the section supplements Theorem 1 
largely by describing conditions that render a process 
uniform. The development might be compared to the 
theory of Fourier series in that one can first display the 
basic nature and use of Fourier series by assuming that 
certain manipulations are permitted and then proceed 
to acquire insight by discovering what makes the manipu- 
lations valid. Roughly, this section (after Theorem 1) 
displays the following: how indecomposability (metric 
transitivity) imphes uniformity for a specific F, how this 
is generalized to a large class of statistics for nondetermi- 
nate signals—with stationarity then brought in to render 
the ergodic hypothesis satisfies, and finally how ergodicity 
joined with independence over time implies uniformity 
when determinate and nondeterminate signals are present. 


A. Time Average of Ensemble Statistics 


As pointed out in the Introduction, essentially all 
statistics can be formulated as averages of a function 
F(a, t). In applications of the following theorem, E(F) is 
usually given and then A(/’) is to be calculated. 
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Theorem 1: Let the process be uniform for F'(7.e., let A(F) 
be the same a.e.) and let AH(F) = EA(F); then A(F) = 
AE(F) for almost all members. (The converse is also true.) 
To see this we start with AE(F) = EA(F); but A(F) is 
constant a.e. and hence HA(/) is equal to this common 
value. That is, A(F) = AE(P) a.e. 

In the hypothesis of this theorem there are three items 
for concern, 7.e., the existence of A(/), the interchange of 
the intergrations defining A and FZ, and the uniformity of 
the process. Theorems that give the existence of A(F) 
are called ergodic theorems and these receive much 
attention in the mathematical literature; the two most 
famous ergodic theorems are stated later. Standard inte- 
gration theory can be used to justify the interchange of 
A and H—the following lemma being typical. Finally, 
uniformity is given rather thorough study in this paper. 


Lemma: Let F be (Lebesgue-Stieltjes) measurable on the 
cartesian product space of J and the real line where the 
usual product measure is used; let F be bounded, and let 
A(F) exist almost everywhere on J; then AH(/’) and 
EA(F) exist and are equal. Standard theorems, such as 
'Fubini’s on interchanging the order of integrations, are 
the major tools required for the proof of this lemma. The 
“ergodic hypothesis” (cf. [8], p. 114 or cf. [6], p. 52) will 
now be derived as a corollary to Theorem 1. 


‘Corollary: Let the process be stationary as well as uniform 
‘for F; then, granting AH(F) = HA(F), A(F) = E(F). 
_ From the theorem, A(f) = AE(F), but by stationarity 
_E(F) is independent of time and hence, AH(f) = E(F). 
A sample calculation using Theorem 1 will be dis- 
cussed. This example requires the generality of Theorem 
1; however, the two calculations on modulated waves 
given in Section III might be handled by the more re- 
stricted results given in Section II-D. Here we consider a 
sampling circuit that samples the input f at times {¢,}. 
‘The output is denoted by h and the sampling functions, 
which are determinate, are denoted by {g,}. Then the 
equation giving the output in terms of the input is assumed 
to be 


LOS pa fate) ge(t) « 


First let us observe that from the linearity of this 
equation it can easily be shown that if all of the ensemble 
distributions of f are Gaussian, all the ensemble dis- 
tributions of h will be Gaussian (cf. [2]). However, even 
if f is stationary, h will not be stationary (except for first- 
order statistics as shown below). 
| Next, consider the special case that one has if g,(¢) = 1 
for t, < ¢ < t+: and g.(t) = O for other values of t. Under 
this condition it may be shown that the first time dis- 
tribution of h is equal to the first ensemble distribution 
of f. For this it is assumed that f is stationary for its 
first ensemble distribution and that Theorem 1 applies. 
Let P, and P, denote the first ensemble distributions of f 
and h, respectively. Now, when t < ¢ < tii, h(t) = f(t) 


Brown: Time Statistics of Noise 


139 


and hence P,(a, t) = P,(a, t,). Since f is assumed station- 
ary, P,(x, t) = P,(x). Taking the time average is effort- 
less and if P,(x) denotes the time distribution of h, 
Theorem 1 gives P,(x) = P,(x) which was to be shown.’ 
If the sampling points are far apart and these flat g, are 
used, h must have its power concentrated at low fre- 
quencies even though f may have its power concentrated 
at high (more easily generated) frequencies. Thus this 
sampler can be viewed as a spectrum compressor; for 
additional discussion of this point see [17]. A sampler was 
built for this case and the above result, P, = P;, was 
checked using a Gaussian f and a special low-frequency 
distribution analyzer. The agreement was good. 

Finally, to make the calculation look less trivial, let 
us consider another form for the g,. The value f(t,) is 
usually held on a capacitor; however, there is some dis- 
charging between samples. This can be accounted for by 
taking g,(t) = exp [(t, — t)/r| for & < t < t4:. For said 
tht) < 2 iffy <2 expe — By alas eon 
ip St < te, we have P,@ pa= Peale ee peiows 
let f be stationary for P, and let t,., — t, = T for each 
k. Then P,(z) = APG; 2) leadsstoela@ et 
Sz P,(a e‘’")dt. In this case, even if P; is Gaussian, P, 
will not be Gaussian. To look at this result for a Gaussian 
f it is more convenient to consider density distributions. 
Let W, and W, denote the density distributions for h and 
f, e.g., W, = 0/dx P,. Then assuming that differentiation 
under the integral sign is permitted, we get 


Te 
W,(«) = all W ,(xe'’")e'”” dt. 


When W, is a Gaussian density distribution having zero 
mean value, it is easy to show that this gives 


W(x) = (+/Tx)[erf ea — erf (v/o)] 
where 


Be) = (n) | exp a2 ae 

Another point that could be made here is that the 
equation for W, given above is equivalent to W,(x) = 
AW, (a, t); hence, with the required interchange of 
differentiation and integration, the conclusion of Theorem 
1 has been extended to a “statistical quantiy” other than 
those falling under the formulations in the Introduction. 
This point is given additional discussion in Section ITI-A. 


B. Indecomposability Implies Uniformity 


We first make some preliminary definitions. With a 
and r fixed, F(a, t + 7) is a function of ¢; a point @ in [ 
is a 7 translate of a for F if F(6, t) = F(a, t + 7) for all 
t (if any such @ exists). The trajectory of a is the set of all 
translates found for all real r. In other words, a trajectory 
is a set of points in J such that the corresponding functions 


3 The relation is not so simple for higher-order distributions. 
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of ¢ furnished by F are the same, except for phase. A 
process is indecomposable for F if measurable collections of 
trajectories for F always have either measure zero or one. 
Before giving an example of a (nontrivial) indecomposable 
space, the important theorem is given. 


Theorem 2: If the process is indecomposable for F and 
A(/) is measurable, the process is uniform for F. 

To prove this, we first note that A(F) is the same for 
every point on any particular trajectory,* 7.e., the time 
average of a function is independent of its phase. Thus, 
the set of points in J having A(/) < C isa set of trajec- 
tories. Indecomposability with the measurability of 
A(F) implies that this set is of measure zero or one, 
depending on C. This implies that A(/) as a function of 
a is constant a.e., 7.e., the process is uniform for F. 

To get some insight as to the nature of indecomposable 
spaces, an example will be given. If the individual tra- 
jectories are measurable, a space J either has a single 
trajectory of measure one or an (uncountable) infinity of 
trajectories, each of measure zero. The single trajectory 
case might be considered trivial, though sometimes of 
interest. Illustrating a nontrivial example is important 
since it shows that one may have indecomposability (even) 
when the models in the ensemble differ by more than 
merely time translations. A. Novikoff suggested the 
following example. The space J is taken as the unit 
square in the (x, y) plane where it is assumed that there 
is a model corresponding to each a = (x, y) in the square. 
Fig. 2 depicts the space with part of a typical trajectory 


Y 


SIDE I 


(0,1) (1,1) 


(1,0) 


Fig. 2—Example of indecomposable space. 


drawn in. Let a be a fixed point in the square, say (4, 4); 
the 7 translate of a is assumed to be on a line of irrational 
slope, say 1/2, at a distance equal to 7 along the indicated 
trajectory where positive 7 rides upward. If the distance r 
causes “puncture” of side J, the trajectory returns on 
the x axis at the puncture abcissa. Starting with another 


4This theorem also holds for the other useful definitions men- 
tioned for A(F’) in the Introduction. 
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point not on this trajectory, say (4, +), another trajectory 
is formed using the same slope, etc. For this set of tra- 
jectories it can be shown that any collection of tra- 
jectories will be of Lebesgue measure one, zero, oF 
nonmeasurable. An F that induces the above set of 
trajectories is the following, where a = (a, y): 


F(a, t) = sin 2n(x + t/V3) + sin 2r(y + V'2t/-V3). 


C. Ergodicity 


Normally one wants a process to be uniform (and/or 
stationary) for a large class of statistics, 2.e., for a large 
class of functions H and for all values of the ¢,’s, u,’s, +++ , 
and P. Strong forms of indecomposability and stationarity 
will bring this about. 

One may consider the function of a and ¢ formed by 
taking the ordered n tuple of nondeterminate noise 
ensemble. Thus, 


Ka, t) = fa); galt), os ae 9 h,tt)). 


A process is basically indecomposable if it is indecomposable 
for K. For the remainder of this paper it is convenient 
to think of the process as consisting of a set of non- 
determinate ensembles, such as resistor noise sources, 
and a set of determinate ensembles, such as the intended 
signals. Then, in the definition of K, only the nondetermi- 
nate signals are included. The situation presented by the 
mixture is developed in the next section. 


Theorem 8: If the process is basically indecomposable, 
it is indecomposable for any H. In turn, in accordance 
with Theorem 2, the process will be uniform for any H. 

To prove this, one may observe that a 7 translate of a 
for K is necessarily a 7 translate for H, that is, if 


bees Bi ae AG Se 7) | a [fo(t), ee , ho(t)] 
for all t, with a, 0, and 7 fixed, then 
H\fla, tb 10) 7 sae ST ep aes) 


=A [f(0; t3) 3% => Gh, 1, Pore eee 


From this it follows that each trajectory for H will 
consist of a collection of sets of trajectories for K. In turn, 
the indecomposability for K will imply indecomposability 
for H. 

We now want to define a strong form of stationarity. 
If B is a set of points in J, let B, denote the set of all 
6’s in I such that there exists an a in B for which @ is a 
7 translate of a for K. The process is strictly stationary 
(a weaker definition is sometimes used, ef. [4]) if for any 
measurable B), Br is measurable and their measures are 
equal for all 7. If the F given with Fig. 2 is equal to K 
(z.e., there is just this one noise in the process), the 
example given there is a strictly stationary process. 


Theorem 4: If the process is strictly stationary and H(H) 
exists, (7) is independent of t, 7.e., the process is station- 
ary for H. 

Because of the assumed stationarity, the values of H 
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merely move along trajectories as “‘t’” varies, and the 
amount of the measure space covered by each value 
interval of H remains constant. From this consideration 
it is not difficult to see that H(H) will be independent of 
time. 

A process is ergodic if it is basically indecomposable and 
strictly stationary. Combining Theorems 1, 3, and 4 we 
have the following. 


Theorem 5: Let H be any function over the nondeterminate 
ensembles, let H(H) and A(H) exist with A(H) measur- 
able, and let the process be ergodic; then A(H) = E(H) 
)a.e. 

One can observe that the theory developed here would 
be rather empty if time statistics seldom existed. Ergodic 
theorems give conditions under which these time averages 
exist in some sense. Only the two most famous ergodic 
theorems (cf. [10], p. 410) are stated. First assume that 
the process is strictly stationary. Also for each a in J and 
each 7 it is assumed that there is one, and only one, 
7 translate of a for K. Birkhoff’s ergodic theorem then 
concludes that the additional condition that H(H) exists 
imphes that A (/7) exists a.e. (and is finite). Von Neumann’s 
‘ergodic theorem concludes that if H(H”) exists, Lim.r.. 
‘1/2T {7 H dt exists. These theorems are quite deep 
‘and the proofs are rather long, requiring many results 
from integration theory. 

| 

ID. Independence Over Time 


_ By considering examples, it can be shown that a process 
‘may not be indecomposable for joint statistics of de- 
terminate and nondeterminate ensembles. This can easily 
be the case even if the ensemble is enlarged in a reasonable 
way to include all translations of the originally determinate 
signals. Thus, the crucial property of uniformity must 
result from some other condition. Some examples display 
the fact that independence over time is this condition. 

To simplify this section it will be assumed that there is 
only one determinate ensemble g(t) and only one non- 
determinate ensemble f,(t). It is easy to generalize here 
to include many ensembles in these classes, but the 
equations become somewhat lengthy. The problem to be 
considered here is that of determining joint time statistics 
of f and g, 7.e., one has a function, H[f(a, t, n) g(t, m); PI, 
and desires to show that the process is uniform for H. 
Then Theorem 1 can be used to give A(H) = AE(H). 
To simplify the notation, the n, m, and P in H are not 
written in this section. If h is a function of f and g (as 
in the examples of Section III), statistics of h are 
included as special cases of joint statistics of f and g. 
F inally, it is convenient to append a subscript to the 
time average operator to indicate which variable is being 
averaged over, ¢.9., 


A Hifla, 8), (] = lim 5, [ Hille, 8), g(0] as 


+ lim 5; [ _Hffle, 9), g(D] ds. 


bow 
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Now let a be a particular member of the ensemble, then 
fa and g are independent over time for H if 


A .H[fla, i), g(t) | = A,A,H[fla, 8), g(r) 
= A,A.A[f(a, 8), g(r) ]. 


A similar definition of independence over the ensemble 
can be made; however, this type of independence is not of 
interest here and hence independence means the above 
type in this paper. 

It is important physically that the time statistic not 
be sensitive to the relative phase of f, and g. Investigating 
this leads to Theorem 6, which is an alternative definition 
of independence. Also, it should be noted that the above 
definition of independence is essentially equivalent to the 
more conventional definition given as the equality of 
joint distributions and the product of individual dis- 
tributions. 


Theorem 6: A(H) is independent of the phase of g if f.(t) 
and g(t + c) are independent for all c. It is assumed here 
that certain interchanges’ of operators are permitted. 

The formal proof is easy and is omitted. An interesting 
point about this theorem is that there is no analogous 
theorem when the real line domain for ¢ is replaced with 
an arbitrary measure space not having a notion of transla- 
tion associated with it. 


Theorem 7: Let the process be ergodic (z.e., let the process 
be strictly stationary and indecomposable for f), then 
granting some interchanges the process is uniform for 
H if, and only if, f,, and g are independent for H for almost 
all a. 

To prove the ‘if” part of this theorem we have an 
ergodic process with f and g independent for H. Thus, 


A(H) = A,A,H[fla, 7), 9(9)]. 


Here A,(H) is a time statistic of f, since the values of g 
are fixed during the r integration, and hence, by ergodicity 
this time average is equal to H(H) a.e. This gives 


A(H) = AHH Ifa, 7), g)], 


which says (among other things) that A(#) is the same for 
almost all a, which was to be shown. 

It might be observed that the /(H) part is independent 
of r by the assumed stationarity. 

For the “only if” part of the theorem, we have an 
ergodic process with A(H) contant a.e. By Theorem 1 


A(H) = AEH[f(a, t), 9]. 


During the expected value integration, the g values are 
fixed and hence E(H) can be considered an ensemble 
statistic of f. Then by ergodicity, H(H) can be replaced 
by the average over the ¢ appearing in H. Thus, we get 


A(#) = A,A.H[fl@, 8), g(t)]. 
’> Lemmas, such as the one given with Theorem 1, can be used 


to justify the interchanges; however, these add much length and 
little content to the points of interest here. 
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Granting a final interchange f, and g are independent for 
almost all a which completes the proof. 

The above theorem displays the importance of in- 
dependence and the next theorem adds some insight. 


Theorem 8: Let the process indecomposable for f and let 
f. and g be independent for H a.e., then the process is 
uniform for 7—assuming AH is measurable. 

To prove this it is first shown that A(/) is constant on 
a trajectory. Let @ be a 7 translate of a. Then AH at 6 
is given thus (by independence and the notion of 
7 translate): 


A, A,H[f(@, 9, g@)] = A,A.H[fla, s + 7), g()). 


But it is easy to see that A,(/) here is independent of 7, 
7.é., time averages are not sensitive to phase. Hence, this 
last form is equivalent to 


A,A,H[f(a, 8), g(r) ] = A(H) 


at a, which was to be shown. Now as was done with 
Theorem 2, the set of a’s for which A(H) < Cis a collec- 
tion of trajectories for f. By the assumed indecomposability 
this set is of measure zero or one depending on C, 7.e., 
A(f) is constant a.e. which completes the proof. 


III. SampLe CALCULATIONS 


This section contains calculations of the first time 
distribution of a sine wave amplitude modulated by 
Gaussian noise and of a periodic wave angularly modulated 
by “almost any” signal. For simplicity these calculations 
have been restricted to first-ordered statistics; of course, 
Section II is applicable to any statistic. Experimental 
verifications are included. 


A. Amplitude Modulated Sine Wave 


Here we take h,(t) = f,(t) sin wt. Since h is a linear 
transformation of f, h will have Gaussian ensemble 
statistics when f is Gaussian. However, it is shown in this 
section that the nonstationarity of h renders its time 
statistics strikingly non-Gaussian. Let W, be the first 
ensemble density distribution of f and let P,; be the 
corresponding ensemble cumulative distribution.° Of 
course, W,(z, t) = 0/da P,(a, t). Let W, and P, be the 
corresponding functions for h. If sin wt # 0 we have 
h.(é) in the interval (z, z + dz) if f.(f) is in the interval 
(z/sin wt, z/sin wt + dz/sin wt). From this it follows that 
W,(z, -) = W,(z/sin wt, t) | sin wt |~*. If f is stationary 
for W, and we set W,(z) = A W,(z, t), the above equation 
gives 


We oe iP WAC) Wome ee 
Qa 0 
Before evaluating this for a Gaussian W,, let it be 


noted that there is a problem posed here. Density dis- 
tributions are not statistics of the form H given in the 


6 Tt is assumed that P; is absolutely continuous. 
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Introduction, but cumulative distributions are such 
statistics. As mentioned in the example at the end of 
Section II-A, the technique in the conclusion of ‘Theorem 
1 can be extended to density distributions if the appropri- 
ate integration and differentiation can be interchanged. 
This may be improvised as needed. In particular, it is 
justifiable in the case here and in the example of Section 
II-A under the single assumption that P, is absolutely 
continuous. Continuing in this manner, many important 
extensions can be formulated. For example, one may 
formulate the power density spectrum over time of a 
nonstationary noise f as the time average of the Fourier 
transform of the ensemble correlation. Thus, 


Ay [exp (jun) Blfalt + fa] dr. 


Here we interchange the Fourier and A, operators. 
Completing the above calculation, when W, is Gaussian 
having zero average value, gives 


r/2 
W,@) = 2 [ (V 2c) * exp (—2’/2c” sin’ 6) esc 6 dé. 
0 


Letting y = (cot 6)'””, one gets the form given in Grobner 


and Hofreiter [5], p. 57. The result is 
W,(2) = (wV/2n0)* exp (—2/40°) Kole" /40°). 


Tig. 3 gives this density distribution along with a 
Gaussian distribution having the same standard deviation. 
Vig. 4 shows these same data taken experimentally. The 
multiplication was done by mixing a sine wave at x band 
and a low frequency noise in a magic Tee. The signal was. 
then beaten down to a 30-me carrier and analyzed on an 


analyzer similar to the one described in [13]. Fig. 5 
illustrates the quality of the multiplication. 


B. Angle Modulated Waves 


Let h(t) = F[wt + f.(t)] where F is periodic with 
period ‘‘a”’ and w is constant. The first probability dis- 
tribution over time of h is calculated in this part. In 
fact, it is shown that this distribution is not influenced by 
a large class of modulating signals. However, the higher 
order distributions are influenced by the modulation. 
Since F is periodic, wt can be replaced with a saw-tooth 
wave g(t) where g(t) wt for 0 < ¢ < a/w and g has 
period a/w. 

Drawing on Section II-D, it may be assumed that f, 
and g have a certain amount of independence; it is easy 
to choose f, dependent on g in ways that cause the dis- 
tribution of the modulated wave to differ from the dis- 
tribution of the unmodulated wave. The problem is to 
calculate the first distribution over time of h,. If we let 
P,, denote this distribution (assumed to be the same for 
all a) and let H(v) = lifv > 0 and Hv) = Oify <9, 
then 


Pi) = A.H{z — Flg(t) + f.(é)]}. 


até 
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6 Mo Se © 407 Ko 2) 
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Fig. 3—Density distribution of sine wave amplifier modulated by 
Gaussian noise. 


The first method of doing the calculation is to assume that 
f, and g are independent for H, then 


P,(2) = A,A,H\z — Fig) + f.(s)]}. 


However, A,H is recognized as the first distribution of 
the phase shifted wave F[g(t) + 6] where 6 = f,(s). 
Clearly, by the periodicity of F, this phase shift does not 
change the distribution. Thus, if P,, denotes the dis- 
tribution of the unmodulated wave, 

P,(2) 7 A.P,,(2) = P,,(). 


The type of independence assumed above leaves some- 
‘thing to be desired in that it depends on F, 7.e., one 
could have the independence for certain F’s but not have 
it for others. It will now be shown that if g and f, have 
ordinary first-order independence, P,(z) = P,,(z) for 
any measurable F. Here it is assumed that the joint 
distribution of f, and g over time is equal to the product 
of the first distributions of f and g. Using the basic theorem 
on calculating average values from distributions (cf. 


[4], p. 12) we have 


PQ = AAfz — Flo) + f.()} 
=f [ He-Fet plaPalx,y. 
The assumed independence gives 


P@ 


| 
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Fig. 4—Density distribution of sine wave amplitude modulated by 
Gaussian noise. Double exposure. Receiver (Gaussian) noise and 
mixed noise. Compare Fig. 3. 


(b) 


Fig. 5—Synchronized pictures of modulation and modulated signals. 
(a) Modulating signal. (b) Modulated signal. 


Since g(t) is a saw-tooth, 
a‘ 24 a : a 
ee il - | / jp = IRE ax | ety. 
—c 0 


Because of the periodicity of F, the term in brackets is 
independent of y and we have (observing that the term 
in brackets is equal to P,,) 


Py = P.@ | dP) = P... 


This result was experimentally verified by using a sine 
wave for F and a variety of modulating signals (f,). 
The results are shown on Fig. 6. The frequency modulated 
signal (h) was produced by mixing two klystron outputs. 
One klystron was at a fixed frequency while the frequency 
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DENSITY DISTRIBUTION“ANALYZER 


(a) 


(b) 


Fig. 7—Apparatus for frequency modulation. 
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Summary—An n-place binary parity check code which corrects 
up to and including e errors in each code letter is fully described 
by its n characteristics, which are r-dimensional vectors, where r 
is the number of redundant binits in each code letter. It is shown 
that the characteristics of such a code have the essential property 
that any subset of 2e of them are linearly independent. An upper 
bound on 7 for fixed n and e is obtained by consideration of a sys- 
tematic procedure for finding the characteristics; this upper bound 
is always less than, or equal to, twice the lower bound of Hamming.! 


INTRODUCTION 


WO notions are central to the class of error-correc- 
aw tion codes to be considered here. One notion is 
that of error location by means of systematic 
parity checks,’ and the other is that of generation of code 
letters by presuming they form an Abelian group.” These 
two notions, which have been proved equivalent,” may be 
used extensively in the study of error-correction codes 
which meet the following specifications: 


1) The total number of binits in each code letter, 
including both information binits and redundant 
binits, 1s 7. 

2) The number of binits altered in a code letter during 

transmission is less than or equal to e. 

If the number of redundant binits is 7, then every 

instance of error satisfying specification 2) must be 

uniquely indentified by a corrector’ consisting of r 

binits. 

4) Each redundant binit of a code letter is a linear 
sum (modulo 2) of one or more information binits. 


Tue LingAR INDEPENDENCE OF CHARACTERISTICS 


A parity-check code is completely specified by its 
characteristics’ which are actually a tabulation of the 
decoding relations. If each code letter consists of n binits, 
r of which are redundant, then the code is described by n 
characteristics, each of which is an r-binit number. The 
essential nature of the characteristics associated with a 
given code lies in the degree to which they are linearly 
independent. If e is the maximum number of binits which 
may be altered during transmission of a code letter, then 
any set of not more than 2e characteristics is lnearly 
independent, as will be shown. 
If the characteristics are regarded as r-dimensional 
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general than that of Golay, ibid. 
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vectors, all of whose components are zero or one, then 
addition of characteristics is performed according to the 
usual rules of vector addition with the appropriate 
modification that addition of components is performed 
(modulo 2). The kth decoding relation which produces 
the kth binit of the corrector is obtained from the kth 
components of the n characteristics. If the kth component 
of the jth characteristic is one, then the jth binit of the 
received code letter is included in the kth decoding 
relation. Thus, those binits (of the received code letter) 
whose sum (modulo 2) constitutes the kth parity check 
are singled out by the nonzero kth components of the n 
characteristics. 

The r-encoding relations are obtained in a similar 
fashion from the n characteristics. The decoding relations 
applied to a code letter prior to transmission yield a 
corrector all of whose components are zero. If r of the n 
characteristics are chosen to be the unit vectors of 
r-dimensional space, then these characteristics will 
correspond to the redundant binits of the code letter while 
the remaining characteristics will correspond to the 
information binits. The nonzero kth components of the 
characteristics corresponding to the information binits 
will then single out those information binits of the code 
letter whose sum (modulo 2) is the kth redundant binit 
of the code letter. 

The theorem concerning the characteristics of a parity 
check code may be stated as follows. The n-place, binary 
parity check code generated by n characteristics will 
correct all single, double, --- , and e-tuple errors if and 
only if every set of 2e characteristics is linearly independ- 
ent. For proof let a,, a2, --- , a, be the n binits of a typical 
code letter, and let a; = , a;,) be the jth 
characteristic of the code. Then the following relations 
hold for the code letter prior to transmission. 


110, o) Os A2 DB oar o) Ani An = 0 


ff 
\Q@j1, Aja, °° 


QA, (op) Az Ag ) og ) nr n = O. 


The corrector ¥ is an 7 dimensional vector whose 7th digit 
is obtained from 


n 
= Ses 
a 5 50; 
7=1 


where a; is the value of a; after transmission. Suppose a 
is the only binit of the code letter which is altered during 
transmission; the corrector will then be the kth character- 
istic, a;. Suppose a;,, @;,, ---* , a, (where p < e) are the 
binits which are altered during transmission. Then the 
corrector isy = a;, ®a;,, @--- a;,. Two distinct, allow- 
able instances of error will have the same corrector if and 
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only if some set of not more than 2e characteristics is 
linearly dependent. 

To accomplish decoding, a complete decoding book is 
necessary. This is simply a list of the allowable instances 
of error and the associated unique correctors; the list of 
correctors consists of all possible linear sums of character- 
istics taken not more than e at a time. The characteristics 
for the case n = 32 and e = 2 are given in the final section. 


UpprR AND LOowrER BOUNDS ON r 


Hamming’ obtained a lower bound on the minimum 
number of redundant binits required to correct up to and 
including e errors in a parity-check code where each code 
letter consists of n binits: 


vel 


An upper bound on r may be obtained by considering a 
crude but systematic procedure for finding the character- 
istics of a code for fixed n and e. 

The first characteristic is chosen arbitrarily, subject 
only to the condition that it not be the null vector. 
The second characteristic is chosen so that it is different 
from the first and the null vector. The third characteristic 
is chosen so that it is different from the first two, from the 
null vector, and from the sum of the first two character- 
istics. The kth characteristic is chosen so that it is different 
from the previously chosen characteristics, from the null 
vector, and from all m-tuple sums of previously chosen 
characteristics where m is any positive integer less than 
or equal to 2e — 1. The whole computation may be 
tabulated as follows. 


1) 2, #0 
2) a ~ 0,a 
3) as ¥ 0, a, a, & Da 


AN een Manas daCDia Mana waehie, 


n — 0, a; 


pe as, OG a,, 


ig IP se ma 


RI 


n) 


a, pa aj, O a;, (>) ray @ a;,.-, ‘ 


Each stage of the computation contains all the entries 
of the previous stages. This method of generation guaran- 
tees the needed independence of the characteristics. The 
method will not go to completion successfully unless 7 (the 
dimensionality of the characteristics) is sufficiently large. 
To find such an 7, consider the worst possible outcome of 
making arbitrary choices for the a’s. That outcome consists 
of all the entries in the nth stage being distinct; to insure 
success in the face of this worst outcome, 2’ need only be 
greater than the total number of entries in the nth stage. 
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TABLE I 


CHARACTERISTIC VECTORS FOR DouBLE-ERROR CORRECTION 
or 32-Brn1it MESSAGES 


a at 

1 1000 0000 0000 0 
2 0100 0000 0000 0 
3 0010 0000 0000 0 
4 0o001 0000 0000 0 
5 0000 1000 0000 0 
6 0000 0100 0000 0 
7 0000. 0010 0000 0 
8 0000 OORORE 0000 0 
9 0000 0000 1000 0 
10 0000 0000 0100 0 
11 0000 0000 0010 0 
12 0000 0000 0001 0 
13 0000 0000 0000 ut 
14 1100 Lit © @ 0000 0 
15 ih OW OM i 1 0000 0 
16 ORO 0000 Le ORO 0 
17 LOR 0000 © i ul 0 
18 Qa LO OREO 0000 0 
19 Oak tw 0001 1000 0 
20 Ones 0000 0110 0 
21 OF LSO 0000 0001 1 
22 iL @) @ al ial i @ 0000 0 
23 TOROR OFOEe 1000 0 
24 i @ @ 0000 ie it T.© 0 
25 1001 0000 ORO 1 
26 IOnOst i @ @ il 0010 0 
27 1001 0100 TOZOR 0 
28 ORO ORO RIO ORTORO 1 
29 LOEOM IOZORO 1000 1 
30 IPOTORE al Oil OsIZORO 0 
31 iL @@ il th ah Ooi 0001 1 
32 ee ea IE Tesh kA, Shoal 1 


rs (r=1) 


7=0 t 


The smallest integer r that satisfies this last equation 
is an upper bound on the minimum number of redundant 
binits required to correct not more than e¢ errors in a code 
letter consisting of n binits. If {a} denotes the smallest 
integer greater than or equal to a, then” 


rw = foe [14'S >] 
r= floes | (Th 


As an example consider a 32-binit code letter in which 
all single, double, and triple errors are to be corrected. 


fel EO} 
a = {tog Be (*) | = 


It has been found’ that the required number of redundant 
binits is at least 15. 


Tmax 


5 A similar but less fine upper bound is given by E. Gilbert, 
‘A comparison of signaling alphabets,’’ Bell Sys. Tech. J., vol. 31 
pp. 504-522; May, 1952. 

6J. Reed, “A class-of multiple-error-correcting codes and the 
decoding scheme,” IRE Trans. on INFoRMATION THEORY,’ 10. 
PGIT-4, pp. 38-49; September, 1954. 
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The upper bound is useful only when e is small com- 
pared to n. In that case tax — Tmin Will be small compared 
on. This is shown in the following argument in which L 
eans lim,_... 


| 
+ 
oe 
oe 
ne} 
I 
Si 


(e — 1) log. n — log, | Ge DH] 


a "min = ib; ne 
nr n 


? = Tmin 


IG, Mma x 
n 


= 0. 


| 
| A more interesting property of the upper bound is that 
ot is always less than or equal to twice the lower bound. 


| 
| 
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MISCELLANEOUS REMARKS 


Experience has shown when e is small and 7 not too 
large, the characteristics may be readily found by trial 
and error with the help of the bounds on r. The case for 
n = 32 and e = 2 was worked by hand and is given in 
Table I on the preceding page. It was found that 7,.;, = 10 
and Tmax = 13. The choice of a, — a 13 was immediate, 
and the rest followed with some labor. The complete 
decoding book would consist of Table I and 496 other 
entries. 


The Utility of a Communication Channel and Applications 


Summary—This paper demonstrates the applicability of the 
functional equations of dynamic programming to information theory 
problems. Yielding the same results as those obtainable by Shan- 
non’s equations, the functional equations can be modified also to 
consider the many restrictions enforced upon information systems 
by the real world. A result of the application of functional equations 
to systems operating under suboptimum conditions is that the infor- 
mation rate of a system is dependent upon the manner in which the 
information is used. 

The Kelly concept—the gain of a gambler who wagers his capital 
on the outcome of a communication channel—is used to determine 
the information rate of the channel. The mathematical analysis 
follows the stochastic multistage decision process technique of Bell- 
man and Kalaba. Together with some extensions by the author, the 
Kelly-Bellman-Kalaba model of communication is repeated. The 
models are analyzed for the optimum case and examined for various 
suboptimum conditions. The gambler’s betting policy is analogous 
to information usage; restrictions upon this policy affect the infor- 
mation rate of the system. They can require that the policy which 
is best under optimum conditions be replaced by other policies 
which, although inferior in the ideal case, are better able to com- 
pensate for the restrictions. 

A null zone reception system first analyzed by Bloom and others 


. .* Manuscript received by the PGIT, April 29, 1958. 
+ The Rand Corp., Santa Monica, Calif. 
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is reanalyzed to provide a concrete example of the latitude of opera- 
tion allowed by the functional equation approach. Bloom’s analysis 
assumed that the system operates under optimum conditions. His 
results are duplicated, and their expression indicates their altera- 
tion by suboptimum conditions. An appendix expresses the results 
of this paper in the form used by Bloom. 


INTRODUCTION 


HE problem of determining the utility of a com- 
Pee channel as one within the framework 

of the theory of multistage decision processes, or 
dynamic programming, has been discussed by Bellman 
and Kalaba [1]. 

To introduce the utility concept, we consider a gambler 
who uses the output of a communication channel in 
connection with the determination of a betting policy 
that will maximize the expected value of the logarithm 
of his capital after N stages of betting. The amount of 
capital that the gambler can acquire using communi- 
cation systems with various properties gives us a quanti- 
tative measure by which these systems can be compared. 
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In practice, we are often given a fixed communication 
channel and the task of deciding how it can be used most 
efficiently. These problems can be treated by maximizing 
Shannon’s equations [3] for the rate of a channel in bits 
per symbol. 

We see that the utility concept employed in a dynamic 
programming approach yields the same results that are 
obtained by using Shannon’s’ equation. In addition, the 
equations of dynamic programming are easy to formulate 
and admit a simple solution via digital computers. Often, 
in fact, elementary analytic solutions may be obtained. 
These equations are also easy to interpret—a character- 
istic usually absent in information theory due to a number 
of complexities inherent in its methods. 

We also consider the effects of placing certain restric- 
tions upon the manner in which the information taken 
from the communication system can be utilized. By doing 
this, it is seen that the information content of a system 
is dependent upon the way in which the information is 
used. For communications components with variable 
parameters, we see that the setting of a component for a 
theoretically optimal utilization of information is not 
necessarily the best setting if we are forced to operate 
in some suboptimal situations. Dynamic programming 
techniques are used in these demonstrations. These 
techniques and the utility concept add a breadth to 
information theory that enables it to be adapted to many 
realistic situations. 

Two problems are considered in this paper. The first 
one is a hypothetical gambling situation that is used 
to demonstrate the above assertions. The second problem 
is a reanalysis and extension of a communication problem 
solved by Bloom, et al. [2]. This is used to show the apphi- 
cability of these ideas to a concrete situation. In both 
cases, we show that some theoretically optimal procedures 
are not as good as some suboptimal procedures when 
certain restrictions are placed upon the manner in which 
the information obtained from the system can _ be 
employed. 

The work on the reanalysis of Bloom’s paper [2] was 
done with Bellman, Kalaba, and M. Juncosa. Their ideas 
and opinions were also very helpful in the formation of 
the remainder of the paper. 


THE GAMBLING PROBLEM 


In the first problem, the reasoning follows that given 
in Bellman and Kalaba [1], [5]. 

We are presented with the following situation. The a 
priort probability that a future event will be successful 
is 1/2. A gambler has a communication system on which 
he receives a positive pulse if the event is successful and 
a negative pulse if the event is not successful. However, 
because the channel is noisy, the gambler only knows 
with probability p that if a positive pulse is received, then 
a positive pulse has been transmitted. On the basis of this 
information, the gambler wagers a fraction of his capital 
on the success or failure of each event and tries to maxi- 
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mize the expected value of the logarithm of his final 
capital after N stages of betting. 

Consider now three ways in which the communication 
system might be employed. 


1) The gambler could require one pulse for each event, 
and bet on the information conveyed by this pulse. 

2) The gambler could require two pulses for each event, 
and bet only if both pulses (each of which has a 
probability p of being correct) agree. In this case, 
between two and four pulses are the expected number 
required for each bet. 

3) The gambler could require two pulses for each event 
and add them (symmetrical, additive, Gaussian 
noise will be assumed). The policy employed here 
is similar to the policy employed in Case 1) except 
that the gambler would have a greater assurance 
of the outcome of the event, but would pay for it 
by requiring two pulses for each bet. 


MATHEMATICAL FORMULATION 


The methods of dynamic programming will be used. 
We shall determine the gambler’s gain if optimal policies 
are followed in each of the above processes. The analysis 
pertaining to Case 1) can be found in Bellman and Kalaba 
[1]; it will be repeated here for the sake of completeness 
of this report. 

We define for all three cases, 


fy(z) = the expected value of the logarithm of the 
gambler’s capital after N pulses have been 
transmitted when the initial capital is 2. 


Let y (0 < y < 2) be the amount that the gambler 
wagers at each stage of betting. Then in Case 1), we have 


i = Max [pfyv-ite + y) + gfy-sile — y)]. (1) 
fi(x) = Max [p log Gey) Fg loge) |e 


where p + gq = 1. 

The value of y that maximizes f, is y = (p — q)= for 
p = qand y = O for p < gq. In these problems, we assume 
p > q. If p were less than ¢g we would bet that the event 
was not successful; the case for p = q is trivial since we 
know this much before the pulse is sent. Substituting 
for y in (2), we get 


fi = log a + [log 2 + p log p + q log q] (3) 


and by induction we can show that 


fy(x) = log « + N(log 2 + p log p + ¢q log q). (4) 


In Case 2), the gambler, when he bets, will win with 
probability p’/(p’ + ¢) and lose with probability 
q/(p’ + @). However, recall that two pulses are required 
for each decision and with probability 2pq the two pulses 
will not agree and the gambler will not be able to make 
a bet. Taking these facts into account, the recurrence 
relations are 


= M p or 
fax | Pf (ray) 


Eaie — ny 
mi p ae ¢ 5 fy—o(a@ Y) =r 2pafy ste | (5) 
Io(x) = — D p =F low ( (¢ +h y) 
Pie P Te g(x — y) + 2pq log | (6) 


| The value of y that maximizes f, is 


for p 2 q, and y = 0 for p < q. Recall, we assume p > gq. 
Substituting y in (6) and proceeding inducttvcle we get 


N 2 9 9 9 
Fy(x) = log a + ot [p log 2p° + ¢° log 2q° 


Sec ad \NlOg Pang) we) 


In Case 3), the functional equations are identical to the 
equations of Case 1), except that now N is replaced by 
"VV /2 and the value for p is different. The new value for p 
will be denoted by p,. p, is the value, for the probability 
shat the polarity of the pulse received is the same as the 
olarity of the pulse that was transmitted, that is obtained 
f we assume a Gaussian distribution for p and decrease 
‘ts variance by 1/~+/2. Clearly, p, > psince in this case we 
are using a sample of two pulses to determine the outcome 
of an event whereas in Case 1) we only used a sample of 
pne pulse. Therefore in Case 3), 


N 
fy(v) = logx + 5 (log 2+ p, log p, + q, log gq). (8) 


These equations were derived by betting the amount y 
that optimized fy(a). The values of fy(@) yield a quanti- 
fen measure for the utility of the communication 
hannel for the three systems of betting. However, 
suppose that practical circumstances prohibit the assump- 
tions made in the formulation of the three equations to 
be satisfied. These unavoidable restrictions can sig- 
nificantly alter the relative merits of the systems. This 
roblem is considered in the next section. 


DIscussIon 


_ From (1) the expected value of the increase in the 
logarithm of the capital per pulse is (log 2 + p log p + q 
log q); this is equivalent to the maximum rate of trans- 
mission in bits per symbol and is the same value that 
would be obtained using Shannon’s formula [1], [7]. 
Since this is the expression for the maximum rate in bits 
per symbol, it cannot be more than doubled by sending 
two symbols. Therefore fy(x) for Case 1) is equal to or 
preater than fy() for Case 2) or Case 3). Actually, they 
are equal only when p = gq and this is trivial since the 
vyambler bets only when p > g. By computing the values 
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for fy(w) for the various processes, we find that fy(x) for 
Case 1) is greater for all p > gq than fy(a) for Case 3) and 
that fy(«) for Case 3) is greater for all p > q than fy(a) 
for Case 2). 

The processes are in the order of 1), 3), and 2), with 
respect to the magnitude of the rate of transmission in 
bits per symbol. However, for the above results to hold, 
it must be true that two pulses are twice as expensive as 
one pulse. If they are not, then Case 2) and Case 3) are 
favored. Also, it has been assumed that the gambler can 
bet as often as he pleases and that there is no expense in 
making a bet. Neither of these assumptions is necessarily 
correct. If the gambler wins too often, he will find it 
difficult to place bets; if he is wagering at a casino, he 
must pay for each bet that he places. These factors 
strongly favor Case 2) and they also favor Case 3). Of 
course, as p — 1 Case 1) becomes favorable since if there 
is a high probability of winning, the gambler would want 
to bet as often as possible. 

At all times, we must consider the gambler and his 
goal, which is to maximize the expected value of the 
logarithm of his capital. If there are any restrictions 
placed upon the gambler’s behavior, then they must be 
included in the functional relationships. Only in this way 
can we determine which of the policies will best serve the 
gambler in the given situation. 

It is difficult to perceive the realism of the gambling 
problem, but the same analysis and comments apply to 
any system in which pulses are received, a decision is 
made on the basis of these pulses, and a quantity of 
available resources utilized according to this decision. 

These factors have obvious counterparts In economic 
processes. 

The object of this discussion is to show that the utility of 
the communication channel is influenced by factors 
external to it, and that, only by considering the manner in 
which the information will be utilized, can the communi- 
cation system be evaluated. 


THRESHOLD SETTINGS oF A NULL ZONE IN 
Binary PuLse TRANSMISSION 


The problem studied by Bloom [2] will now be re- 
analyzed using the foregoing ideas. Bloom shows that for 
binary transmission with symmetrical, additive, Gaussian 
noise, the rate of transmission in bits per symbol is 
increased if a threshold above the zero level is set in the 
receiver and all signals that are received below this 
threshold are rejected. A curve is drawn for the setting 
of the threshold as a function of the signal-to-noise ratio 
of the transmitted pulses. Since this curve is based on 
Shannon’s equation for channel capacity [8], it is assumed 
that the optimal code is used, or, translated into the 
gambling problem, that the gambler is operating under 
ideal conditions. We will show that if restrictions are 
placed upon the gambler, it can be more profitable for 
him to use a threshold setting other than the one that he 
would use under ideal circumstances. 
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It is assumed that the transmitted pulses are positive 
and negative with equal amplitudes and that the noise is 
symmetrical, additive Gaussian. Again we consider the 
gambler, but this time he can only determine whether a 
pulse is received above the positive threshold or below the 
negative threshold, that he has set in his receiver. Conse- 
quently, the threshold setting alone determines the 
probability p that the polarity of the pulse that is received 
is the same as the polarity of the pulse that was trans- 
mitted. (Recall that in Case 1), the gambler could de- 
termine the amplitude of the received pulse and thereby 
estimate a precise probability for each pulse that he 
receives. ) 

Consider that the amplitude of the pulses are + VP. 
‘Then we have the condition of Fig. 1. 


The gambler sets the thresholds at + ¢ and bets accord- 
ing to the following policy: 


1) If the signal < — ¢, bet y that a negative pulse has 
been transmitted. 

2) If —t < signal < + #, do not bet. 

3) If the signal > ¢, bet y that a positive pulse has 
been transmitted. 


By virtue of symmetry, we need only consider the 
+ 


positive half of the graph. Let ,A, be the area from a to b 
under the Gaussian curve with positive mean. 
Now, as in the previous example, we define 


fy(w) = the expected value of the logarithm of the 
gambler’s capital after N pulses have been 
transmitted. 


If a positive pulse has been transmitted, the gambler 
will receive a pulse above the positive threshold with a 


probability p = ,A.s. Observe that even if a negative 


pulse has been transmitted, with probability gq = ;Aa, 
the gambler will receive a signal above the positive 
threshold. (It is clear that in the former case he would 
bet and win, whereas in the latter case he would bet and 


lose.) Furthermore, with probability (.4, + ,A,) the 
amplitude of the pulse will lie below the threshold level 
and no signal will be received. Recall that x is the gambler’s 
capital at each stage of betting and y is the amount that 
he bets. 

From these conditions we can write the functional 
equations, 
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f(a) = Max eRe: “+ y) ay MA ote tea? y) 


eres 
a (oA, af GA arene) bs (9) 
and for the first stage, 
fi(z) = Max [,Aq log (a + y) + .Aq~ log (& — y) 
= (oAy =e) log aA (10) 


This can be rewritten as 


| A. log tase 


fia) ="Max . 


O<u<x 
O<t 


— ,A. log = 


Pa 


+ log | (11) 


From this equation we can see that it is always advan- 
tageous to have a threshold set in the receiver (7.e., t > 0). 
For the zero threshold case, t = 0. We will take f,(x) at 
é = O and subtract it from a value for f,(z) at t = 
As t’ approaches zero, this difference is positive, thus a 
threshold t > 0 increases the magnitude of f, (x). 


le) — fia) = oA, log == — oA: log (12) 


Cote 
oe 
- + 
For ?¢’ near zero, oA, = oA, = K, and 


ee log 4], (13) 


%— Y x 


f(x) _ fi(x) = K| tog 


and since x/(x — y) > (« + y)/« (0 < y < 2), this value 
is positive. 
Irom (11) we can show by induction that 


Gio aie x] eaieeeuens 


O<y<x v 
O<t 


5 ae 
— ,A. log a | (14) 
If fy(x) is maximized with respect to y, 7.e., 


hi 
Oh + = ea 
GAs == PAS) 


we have a value for the expected gain of the gambler, in 
N stages, as a function of ¢. Dividing this by N we get 
the gain per stage as a function of ¢ and this is precisely 
the value that Bloom obtained for the rate in bits per 
symbol as a function of the threshold t (see Appendix). 
Thus, if we were to draw a graph for the value of ¢ as a 
function of the signal-to-noise ratio, it would be identical 
to the graph compiled by Bloom. However, we must note 
that we used the value of y that maximizes fy(v). What if 
this value of y is not obtainable? In the case of the gambler 
this can occur if he were only allowed to wager certain 
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fixed amounts. Since fy(x) is both a function of ¢ and y 
-hanging y requires that a new value of ¢ be determined 
oO maximize fy(2). 

Regardless of the value of y, once it is fixed, a value of 
can be determined that will make the gambler’s gain as 
preat as possible. Thus the curve in Bloom’s paper should 
de replaced by a family of curves; the graph (for the case 
of the gambler) would be one for the threshold setting as 
. function of the signal-to-noise ratio and the amounts 
hat can be wagered at each bet. 

The significant factor is that when this system is set to 
operate most efficiently by assuming an optimal code, it 
pperates most efficiently only if this optimal code is used. 
[t is not uniformly efficient for all codes; in fact, the 
optimal setting of the threshold is dependent upon the 
ype of code used. 


CONCLUSION 


_ When an information channel is included in a system, 
t+ becomes part of the system; its most efficient mode of 
pperation is determined by the remainder of the system 
‘and this includes the manner in which the information 
used. 
A detailed analysis of these problems is possible through 
the use of the technique of dynamic programming. 
Although in some situations, the results obtained corre- 
spond to the results of Shannon’s equations, the method 
‘an be extended to treat a number of contingent circum- 
ijtances of the real world. 
h | 

f 


APPENDIX 


We show that for an optimum betting policy, the 
esults of the analysis of the null-zone reception problem 
are the same as those obtained by Bloom. 

Eq. (14) for the expected value of the log of the gamblers 
capital after N pulses have been transmitted is 


ety 


x 


Aq log i; = |. 


| ifferentiating (15) with respect to y and setting the 
esult equal to zero, we see that the value of y that 
maximizes fy(x) 1s 


s@) = = log « + Max w| Ae log 


O<y<a 
O<t 


(15) 


oe = 


whe s Aes 
Utes Nae = ie (16) 


pA As 
Substituting this value of y in (15) yields: 
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fy(x) = loga + N| A. log — 2 An 
Ez Aw ls oe 
iS Ag lope a7) 
rie =a iS, 


It follows directly that the increase in the log of the 
gambler’s capital per stage (or symbol) is 


(18) 


If, instead of “log of capital’ we consider “information,” 
then (18) is the rate of information in bits per symbol. 
Bloom places all signals falling above the positive 
threshold in class x;. He then defines p (J | 2,) and p 
(III | w,) as the conditional probabilities that given a 
signal in class x, it is positive and negative, respectively. 
Symmetry allows him to consider only the positive half 


of the range. It is apparent that p(/ | 2,) = ,A. and the 


p (II | x.) = ,A.. Employing Shannon’s equation for 
the information rate of a system in bits per symbol, 
Bloom obtains 


2p(1 | a) 
p(I | x) log pl | 2,) + p(II | 2,) 
2p LUaas) 
a.) + p(III | 2.) 


+ p(IIT | x) log Gn (19) 


This is identical to (18). 
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Capacity of a Certain Asymmetrical Binary Channel 
with Finite Memory’ 


SZE-HOU CHANGt 


Summary—The capacity of a certain asymmetrical binary chan- 
nel is studied under the following conditions. 1) Blocks of equal 
numbers of binary digits are used as the transmitting symbols. 
2) The channel resumes its quiescent state at the beginning of each 
block. 3) The memory of the channel is characterized by the de- 
pendence of the noise probabilities for each digit upon the pre- 
ceding digit or digits in the same block. It is shown that, by means 
of simple rules and with the aid of a single set of curves or a table, 
the calculation of the capacity can be reduced to a routine process. 


INTRODUCTION 


N recent years the binary channel has become in- 
creasingly important in the storage and transmission 
of information. There are numerous developments 

of physical devices for use in such channels, as well as 
theoretical investigations’ * of binary coding and detec- 
tion schemes. One of the most important criteria upon 
which the design of these devices and schemes is based 
is the channel capacity as defined by Shannon.’ 

A symmetrical binary channel, abbreviated as SBC, is 
a channel in which the noise probabilities’ are identical 
for the two digits, 0 and 1, as indicated by Fig. 1. The 
capacity of an SBC is well known,’ and is given by 


= 1+aloga+ (1 — a) log (1 — a) bit, (1) 


where the log is taken to the base 2. 

While the maximization procedure used in deriving 
(1) is relatively simple, it becomes rather involved when 
the asymmetrical binary channel (ABC) (see Fig. 2) or 
more general channels are considered. Muroga’ has 
simplified the computation of the capacity in a certain 
class of discrete channels by introducing an auxiliary 
column matrix containing elements Q; which are essen- 
tially equivalent to the logarithms of the noise prob- 
abilities. Using this technique, Silverman® has made a 


* Manuscript received by the PGIT, June 5, 1958. This research 
was made possible through the support of the Electronics Res. Diy. 
of the AF Cambridge Res. Center, Air Res. and Dev. Command. 

7 Northeastern University, Boston, Mass. 

1R. W. Hamming, ‘Error detecting and error correcting codes,” 
Bell Sys. Tech. J., vol. 29, pp. 147-160; April, 1950. 

27. S. Reed, “¥ class of multiple- error- -correcting codes and the 
decoding scheme,” IRE Trans. on INFORMATION THEORY, no. 
PGIT-4, pp. 38-49; September, 1954. 

Be Elias, “Coding for Two Noisy Channels” in “Information 
Thoery,” C. Cherr y, ed., Thornton Butterworth, Ltd., London, 
Eng., pp. 61-74; 1956. 

*D, Slepian, “<A class of signaling alphabets,’’ Bell Sys. Tech. J., 
vol. 35, pp: 203-234; January, 1956. 

5 C,’E. Shannon and W. Weaver, ““The Mathematical Theory of 
Communications,” University of Illinois Press, Urbana, IIl.; 1949. 
’ Noise probabilities are the set of transitional probabilities 
py | 2) when z is transmitted and y is received. 

7S. Muroga, “On the capacity of a discrete channel I,” J. Phys. 
Soe., Japan, vol. 8, pp. A84— 494; July-August, 1953. 

2R. A. Silverman, “On binary channels and their cascades,”’ 
IRE Trans. on InrorMatIon JINSIL. joy, I= 27; 
December, 1955. 
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detailed analysis of the asymmetrical binary channel 
However, his analysis deals with the case where the noise 
probabilities remain constant for successive digits, anc 
therefore is applicable only to the asymmetrical binary 
channel with no memory, ABC (wnm). 

The purpose of this paper is to use Muroga’s method 
to compute the capacity of a certain asymmetrical binary 
channel with finite memory, ABC (wfm). The operation 
of this channel is as follows. First, the binary digits are 
transmitted in blocks having an equal number of digits. 
Second, there is sufficient time spacing to prevent inter- 
ference between blocks so that the channel always 
resumes its quiescent state at the beginning of each 
block. Third, the memory of the channel is characterized 
by the dependence of the noise probabilities for each 
digit upon the preceding digit or digits in the same block. 


TRANSMITTED NOISE RECEIVED 
SYMBOLS PROBABILITES SYMBOLS 
x Nf 


fe) 


Fig. 1—Symmetrical binary channel. 


TRANSMIT TED NOISE RECEIVED 
SYMBOLS PROBABILITIES SYMBOLS 
x \f 


(0) 


Fig. 2—Asymmetrical binary channel. 


A block with n digits will be referred to as an n plet. 
The following discussion shows that the computation of 
the capacities of an ABC (wfm) using doublets, triplets, 
etc., is based upon the capacity computation of a single 
digit transmission, or more precisely, upon that of an 
ABC (wnm). Therefore, the latter case is reviewed first. 
In so doing, a set of curves and its associated table shall 
be introduced. These are not discussed in Silverman’s 
paper but are found extremely useful in the following 
sections. 


ASYMMETRICAL BINARY CHANNEL 
(with No Memory) 


The noise probability matrix of an ABC (wnm) is 
indicated by Fig. 2 and 


The corresponding noise entropy matrix is given by 
Fi a x | —a@ log a — (1 — a) log 1 — ”} (3) 
H(8) Beales 98) logs 


|Muroga introduced an auxiliary column matrix Q which 
is defined? as 


Qe Aer |: (4) 


Upon the completion of matrix inversion and multipli- 
cation, the Q’s are explicitly determined; thus, 


_=1__|8H@) + @—DH®E)) 


L(@ — I)H(a) + aH(6) 


The capacity of an ABC (wnm) can be expressed in terms 
(of the Q’s by 


Gra log. ec 22: = lon (22* 1. 9°?),, (6) 
i=l 

| This procedure was used by Silverman in the study of 
ithe ABC (wnm). Readers are referred to the original 
ipaper for the detailed graphs of C and other quantities 
«as functions of a and 6.” In the following discussion, similar 
}procedures are used to obtain the capacity of the ABC 
\(wfm). It then becomes evident that it is convenient to 
huse Q, and Q, given by (5) as the building elements of 
pmore complex Q matrices which in turn determine the 
‘capacities of different channels. Therefore, it is desirable 
to calculate the values of Q, and Q, as functions of a and 
8. In Fig. 3, the contours of Q,’s are plotted in the a-@ 
plane. Because of the inherent symmetry between the 
expressions of Q, and Q., the same contours may be used 
o read Q, by simply interchanging the coordinates a 
nd £. 

In practice, the values of a and 8 are more commonly 
found in the range of 0.5 to 1. Table I lists the computed 
values of Q; corresponding to this region of a and 6 with 
ner increments of both parameters. 

To illustrate the use of the curves, the channel capacity 
fof an ABC (wnm) as a function of a is computed for a 
i ew selected values of 8 by means of (6) and Fig. 3. 
liFig. 4’° shows these results together with one additional 
lcurve, that of a symmetrical binary channel in which 8 
lis always set equal to a. It is interesting to note that the 
jchannel capacity is greater than zero except when 
lt + 8 = 1, in which case transmission of information is 
Jimpossible because there is no change in the probabilities 
jof the transmitted digits upon the reception of either of 
ithe two digits. In other words, the a posteriori probabilities 


9 The notations in Muroga’s and Silverman’s papers are different 
|from those used here. 

|| 10These curves were first obtained by R. D. Klein of North- 
eastern University using direct maximization independently of 
{Silverman. 
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Fig. 3—Values of Q; for 0 < a < 1 and 0 < 6 < 1. (Interchange 
a and 6 to read Qs.) 


Cc BITS 


Fig. 4—Capacity of symmetrical and asymmetrical binary channel. 


of the digits are the same as the a priori probabilities. Of 
further interest is the fact that in the vicinity of the 
zero-capacity point the capacity can be improved either 
by increasing a, the probability of correct recognition of 
0, or by decreasing it. In either way the equivocation is 
reduced. It. also should be noted that the full capacity 
of 1 bit per digit is reached when both a and 8 are equal 
to unity or when both are equal to zero. In the latter case 
an exchange of the roles of the two received digits results 
in a perfect channel. 


ASYMMETRICAL BINARY CHANNEL WITH FINITE 
Memory FOR THE DOUBLET 


The simplest case of an ABC (wfm), one that used a 
doublet in a block, is studied now. Altogether there 


TABLE I* 
VALUES OF Q ror 0.5 < {a\ =< 1-0 
Aa = Ag a 0.02 


0.56 


ree 0.50 0.52 0.54 0.58 0.60 0.62 0.64 0.66 
0.50  —1.000000 —0.971125 —0.942225 -—0.913225 —0.884087 —0.854750  -—0.825175 —0.795296 —0.765056 
0.52 1.028875 0. 998845 0.968798 0.938667 0.908412 0.877966 0.847290 0.816318 0.784989 
0.54 L-Obtare 1.026581 0.995377 0.964107 0.932720 0.901155 0.869369 0.837292 0.804866 
0.56 1.086775 1.054392 1.022017 0. 989586 0.957197 0.924357 0.891446 0. 858256 0.824720 
0.58 1.115912 1.082322 1.048753 1.015147 0.981453 0.947607 0.913559 0.879239 0.844583 
0.60 1. 145250 1.110425 1.075641 1.040838 1.005963 0.970950 0.935746 0.900282 0.864491 
0.62 1.174825 1.138740 1.102719 1.066696 1.030619 0.994419 0.958041 0.921416 0.884474 
0.64 1.204704 1.167331 1. 130042 1.092776 1.057744 1.018061 0.980490 0.942682 0.904571 
0.66 1.234944 1.196250 1.157666 1.119125 1.080566 1.041922 1.003131 0.964121 0.924818 
0.68 1. 265608 1.225558 1.185644 1.145798 1. 105956 1.066048 1.026012 0.985772 0.945255 
0.70 1.296772 1. 255326 1.214044 1.172857 1.131698 1.090495 1.049183 1.007686 0.965926 
0.72 1.328520 1, 285633 1.242041 1.200372 1.157858 1.115323 1.072702 1.029915 0.986883 
0.74 1.360946 1.316566 1.272416 1. 228420 1. 184508 1.140601 1.096632 1.052517 1.008179 
0.76 1.394154 1.348225 1.302563 1.257089 1.211730 1. 166405 1.121043 1.075561 1.029876 
0.78 1.428273 1.380729 1.333494 1. 286483 1. 239622 1.192826 1. 146023 1.099125 1.052048 
0.80 1.463453 1.414220 1.365339 1.316725 1. 268298 1.219972 1.171669 1, 123300 1.074780 
0.82 1.499880 1.448870 1.398262 1.347966 1.297900 1.247971 1.198102 1. 148200 1.098176 
0.84 1.537781 1. 484893 1.432456 1.380392 1.328599 1.276987 1.225473 1.173963 1. 122364 
0.86 1.577446 1.522557. 1.468188 1.414237 1.360613 1.307220 1. 253969 1.200762 1.147504 
0.88 1.619262 1.562226 1.506147 1.449812 1.394234 1.338940 1. 283839 1, 228828 1.173809 
0.90 1.663756 1.604387 1.545687 1.487544 1.429856 1.372514 1.315422 1.258475 1.201567 
0.92 1.711692 1.649785 1.588577 1.528044 1.468045 1.408466 1.349204 1.290150 1.231192 
0.94 1.764267 1.699436 1.635480 52272 1.509692 1.447620 1.385947 1.324554 1.263329 
0.96 1.823598 1.755397 1.688217 1.621913 1.556357 1.491420 1.426981 1.362917 1.299106 
0.98 1.894333 1.821953 1.750784 1. 680669 1.611464 1.543025 1.475223 1.407920 1.340987 
1.00 2.000000 1.920855 1.843292 1.767119 1.692162 1.618250 1.545229 1.472942 1.401237 
ao 0.68 0.70 0.72 0.74 0.76 0.78 0.80 0.82 0.84 
0.50  —0.734391 —0.703227 —0.671479 —0.639054 —0.605846  —0.571726 —0.536546 —0.500120 —0.462219 
0.52 0.753238 0.720990 0.688157 0.654646 0.620350 0.585136 0.548855 0.511317 0.472293 
0.54 0.772021 0.738682 0.704760 0.670158 0.634769 0.598459 0.561075 0.522426 0. 482280 
0.56 0.790773 0.756334 0.721315 0.685617 0.649129 0.611718 0.573228 0.533466 0.492197 
0.58 0.809521 0.773973 0.737848 0.701045 0.663453 0.624936 0.585335 0.544457 0.502063 
0.60 0.828302 0.791632 0.754389 0.716472 0.677766 0.638134 0.597417 0.555417 0.511894 
0.62 0.847142 0.809649 0.770964 0.731921 0.692091 0.651335 0.609492 0.566363 0.521706 
0.64 0.866078 0.827121 0.787603 0.747420 0.706454 0.664563 0.621584 0.577318 0.531517 
0.66 0.885440 0.845018 0.805285 0.762999 0.720881 0.677841 0.633715 0.588299 0.541346 
0.68 0.904381 0.863062 0.821200 0.778686 0.735399 0.691195 0.645905 0.599327 0.551209 
0.70 0.923825 0.881291 0.838224 0.794514 0.750039 0.704651 0.658182 0.610425 0.561129 
0.72 0.943525 0.899748 0.855450 0.810521 0.764834 0.718242 0.670573 0.621618 0.571125 
0.74 0.963531 0.918480 0.872923 0.826745 0.779821 0.731979 0.683106 0.632933 0.581222 
0.76 0.983901 0.937541 0.890690 0. 843233 0.795040 0.745959 0.695816 0.644398 0.591447 
0.78 1.004703 0.956993 0.908810 0.860035 0.810539 0.760167 0.708742 0. 656050 0.601829 
0.80 1.026016 0.976908 0.927348 0.877213 0.826373 0.774671 0.721928 0. 667926 0.612404 
0.82 1.047935 0.997375 0.946385 0.894842 0.842610 0.789533 0.735428 0.680076 0.623213 
0.84 1.070579 1.018503 0.966021 0.913010 0.859332 0.804826 0.749308 0.692559 0.634303 
0.86 1.094094 1.040426 0.986381 0.931832 0.876640 0.820642 0.763651 0.705445 0.655753 
0.88 1.118678 1.063323 1.007626 0.951455 0.894665 0.837102 0.778565 0.718832 0.657630 
0.90 1. 144593 1.087439 1.029979 0.972082 0.913601 0.854369 0.794194 0.732847 0.670052 
0.92 Tele 2222 1.113119 1.053757 0.993999 0.933696 0.872688 0.810747 0.747948 0.683178 
0.94 1.202154 1. 140906 1.079453 1.017655 0.955358 0.892387 0.828545 0.763595 0.697255 
0.96 1. 236206 1.171745 1. 107928 1.043829 0.979289 0.914129 0.848148 0.781104 0.712712 
0.98 1.274291 1.207695 1.141055 1.074217 1.007018 0.939272 0.870771 0.801270 0.730478 
1.00 1.329972 1. 258987 1.188126 1.117224 1.046105 0.974573 0.902410 0.829362 0.755129 
a 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00 
0.50 —0.422554 —0.380738 —0.336243 —0.288308  —0.235732  —0.176404 0.105666 = —0.000000 
0.52 0.431489 0.388515 0.342840 0.293694 0. 239871 0.179287 0.107143 0.000000 
0.54 0.440340 0.396213 0.349362 0.299014 0. 243953 0. 182045 0. 108596 0.000000 
0.56 0.449122 0.403845 0.355822 0.304277 0.247988 0. 184807 0.110027 0.000000 
0.58 0.457852 0.411423 0.362232 0.309495 0.251982 0.187539 0.111439 0.000000 
0.60 0.466544 0.418963 0.368604 0.314675 0.255953 0.190245 0.112836 0.000000 
0.62 0.475213 0.426477 0.374947 0.319828 0. 259166 0.192929 0.114220 0.000000 
0.64 0.483874 0.433978 0.381275 0.324964 0. 263799 0.195599 0.115593 0.000000 
0.66 0.492544 0.441481 0.387598 0.330090 0.267707 0.198258 0.116959 0.000000 
0.68 0.501239 0.448999 0.393928 0.335218 0.271612 0.200911 0.118320 0.000000 
0.70 0.509976 0.456547 0.400279 0.340358 0.275521 0.203564 0.119679 0.000000 
0.72 0.518774 0.464143 0.406663 0.345520 0.279444 0. 206223 0.121039 0.000000 
0.74 0.527654 0.471802 0.413096 0.350716 0. 283388 0.208894 0.122403 0.000000 
0.76 0.536638 0.479546 0.419594 0.355960 0. 287365 0.211583 0.123775 0.000000 
0.78 0.545754 0.487396 0.426175 0.361266 0.291384 0.214298 0.125157 0.000000 
0.80 0.555032 0.495378 0.432861 0.366651 0.295459 0.217048 0.126555 0.000000 
0.82 0.564507 0.503524 0.439678 0.372135 0.299605 0.219841 0.127974 0.000000 
0.84 0.574225 0.511869 0.446655 0.377744 0.303840 0.222691 0.129418 0.000000 
0.86 0.584238 0.520461 0.453831 0.383506 0.308185 0.225611 0. 130897 0.000000 
0.88 0.594621 0.529361 0.461255 0.389460 0.312670 0. 228622 0.132418 0.000000 
0.90 0. 605468 0.538648 0.468995 0.395660 0.317334 0.231747 0.133995 0.000000 
0.92 0.616916 0.548438 0.477143 0.402179 0.322231 0. 235024 0.135645 0.000000 
0.94 0.629178 0.558909 0.485846 0.409131 0.327445 0. 238507 0. 137396 0.000000 
0.96 0.642620 0.570370 0.495355 0.416714 0.333121 0.242291 0.139294 0.000000 
0.98 0.658038 0.583489 0.506217 0.425355 0.339575 0. 246583 0.141440 0.000000 
1.00 0.679347 0.601546 0.521105 0.437151 0.348345 0. 252387 0. 144326 0.000000 


*Interchange a and 6 to read Qo. Values of 


2) 


’s are negative or zero. 
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are four types of doublets, namely, 00, 01, 10, and 11. 
if the decision in the detection process is nde on a block 
‘by block basis rather than on a digit by digit basis, then 
ithe channel is described by a 4 X 4 noise probability 
imatrix associating the four transmitted doublets with the 
four received doublets, as shown below. 


N 
y 
a CO Ol IO ih 
00 ier Ai2 A483 nn 
a 

Ol ive Qo, Az2 Ae3 Asa ( ) 
10 Q31 Qszq G33 Aza 

11 L_Q41 42 143 A44_] 


here a;; = p(y; | x;) is the conditional probability that 
pthe doublet y; is received when the doublet w; is trans- 
imitted. Because of the requirement that 


Soren (8) 


pa 


jonly 12 out of the 16 parameters can be assigned in- 
dependently. 
| Muroga’s method as outlined in (4), (5), and (6) may 
ibe used to compute the channel capacity under this most 
general condition of doublet transmission. This by itself 
‘eliminates a considerable amount of computation as 
“compared with the direct maximization process. However, 
ithe calculations are even more simplified if the decision 
bat detection is made on a digit by digit basis and the 
channel memory is assumed to affect only the noise 
mprobabilities of the second digit. 

Three noise probability matrices A,, Aj, and Aj’ may 
be defined as follows. 

For the first digit which is always sent during the 

quiescent state of the channel: 
| 


ay i= “| (9) 
ic B, By 


For the second digit following ‘‘0’’: 


all Qs ee 
ee B3 


For the second digit following “1”: 


| 

1 ’ ’ 
Woe = as 1- “ 
| a i 
| (bee (one is 


II[t is to be noted that these matrices are similar in structure 
lto the one given by (2) for single digit blocks in the ABC 
\(wnm). The matrix for the doublets as given above in (7) 
Iran be considered, in the present case, to be composed of 
| he elements of these three submatrices in the following 
anner. 


(10) 


(11) 


———SS 
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(12) 


Thus, the probabilities that a transmitted doublet 00 is 
received as 00, 01, 10, and 11 are respectively a,ai, 
ai(1 — af), (1 — a,)og, and (1 — a,) (1 — af). This 
matrix contains only six independent parameters, as 
compared with 12 in the general case given by (7).* It 
is to be expected that some simplification may be found by 
exploiting its special structure. 
A useful way of decomposing the matrix of (12) is 


A as Als 0 ay 1 Se 4 == e 0 la. (13) 
OAD aoa Be QO Ay’ 


Here the submatrices Aj and AZ’ should be considered as 
if they were merely elements during the multiplication 
with the elements of A,, and expanded thereafter if 
desired. Using this method, the inversion of A as required 
in (4) is much simplified: 


Ae a a 0 
= 1 4 
02 edd oe 


It is to be noted that the inversion of the 4 X 4 matrix is 
effected by the inversion of three 2 X 2 matrices all of 
which are similar in nature, for example, 


(14) 


It i = 6% a, — 1 
Cone tes Oem eas | ie 


si Yi 
1— 0, 


The last step also defines two new parameters y, and 0. 
The inverses of Af and A’ are obtained in the same 
manner and result in similar parameters, yj, 03 and y3’, 04’. 

In order to form the auxiliary matrix Q, as required in 
(4), the noise entropy matrix should be computed first. 
This is easily done from (12) by computing on a row by 
row basis. After simplification, we get 


| H(a:) + H(o2) | 
(on) + H(6:) 
H(8,) + H(o2’) 
LH(6,) + H(63’)- 


UE 


Ly "| (15) 
or 


ipa (16) 


where all elements are defined in exactly the same way as 
in (3) except for appropriate subscripts and pUpersen Die: 
Finally, the matrix Q is computed. 


uJTt can be shown that the ratio of the two numbers is 2”— for 
the v plet. 
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In this expression the submatrices 


LQi2] LQ. 


. are functions of A,, A, and A3’, respectively, in 


Qs: 
LQ: 


the same way that the matrix 


a is the function of A 
LQ» 

in (5) of ABC (wnm). Therefore, the values of their 
elements can be read directly from Fig. 3 or Table I with 


Qi 
| 
Qu 
and i are formed by iterating the elements of Se : 
LQ. O12] 
An explanation of the notation is perhaps in order. The 
first subscript and the superscript, if any, of each Q,; 
denotes the correspondence to the original matrices, 7.e., 


the A’s, which generate them. The second subscript, 1 or 
2, denotes the position of the Q;; in the submatrice 


ba according to (5). 
Qi 

After obtaining the proper values of Q;,’s from the 
curves of Fig. 3 or Table I, the calculation of the four 
elements of the over-all Q matrix is straightforward. 
Thus, referring to (17), 


Q, = Qu + 1Qa +d — 1)Q2 
Qe = Qu + 1Q22 + CL — 11) Q22 
Qs = Qe + U1 — A)Qn + A131 
Qs = Qe + (1 = 01)Q22 + 01Q22. 


appropriate values of a and 6. The submatrices 


(18) 


Finally, the channel capacity is given, analogous to (6), by 
4 

Ol vloss 2: 
i=1 


= log (2% + 2° + 2° + 2°) bits/doublet. (19) 


We shall illustrate the steps by a numerical example 
and discuss (19) under a few special conditions. 


1) Assume, for a certain channel transmitting doublets, 
a, = 0.80 6, = 0:90 
0.75 


as = 0.70 Bs (20) 


as’ = 0.50 5 = 0.60. 


l| 


From Table I, the following Q;,’s are read: 


IRE TRANSACTIONS ON INFORMATION THEORY 


December 


Qu = —0.794194 Qi. = —0.4382861 
Q), = —0.927966 Qi. = —0.772381 (21) 
ys; = —1.145250 Q33 = —0.854750. 
From (18), 
Q, = —1.660079 Q, = —1.543040 (22) 
Q; = —1.609151 Q, = —1.2993864. 
The resultant capacity is 
C = 0.478922  bits/doublet. (23) 


2) If the first digit is errorless in transmission, 7.¢., 


Ce = By = i then Oya — 0, = i OF — (Di = 0. From 7) 


O31 
ee (24) 
and 
C= Jon 22 ar (25) 


The case is equivalent to two ABC (wnm) in parallel. 
The capacity of each channel Cf and C2’ is determined 
by the noise probability matrices of the second digit, 
Aj and A’, respectively. This point will be clearer if one 
refers to the matrix A of (12). Here the submatrices of 
the upper right and lower left corners are replaced by 0, 
and the doublets 00, 01, and 10, 11 supply in an equivalent 
manner two separate binary channels with no inter- 
connection. It is also evident that the same result will be 
obtained if a, = 96, — 0 


3) If the transmission property of the second digit is 
independent of the first digit, although not necessarily 
equal to that of the first digit, then 


A; = Aj’ = A, (26) 
and 

(Qu + Qa | 
Qu + Qn | 
Qis + Qa: 
LQia + Qoo] 


It may be easily shown that, in this case, 


C= OG, 


(27) 


(28) 


where C, and C; are capacities of ABC (wnm) correspond- 
ing to A, and Ag, respectively. If, further, A, = A,, then 
the channel is memory-less and 

C = 2C, bits/doublet, (29) 
or C = C, bits/digit, 


as 1t should be. 
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4) The limiting conditions under which the channel 
becomes either perfect or capacity-less can be seen more 
easily by examining first the case of the ABC (wnm). 
The perfect capacity (C = 1 bit/digit) is obtained when 
either a = 6 = 1, ora = 6 = O. The channel possesses 
zero capacity when a + 6 = 1, or the 2 X 2 determinant 
of the single digit matrix | A | = 0. 

The same conditions apply to the doublets, except that 
all three submatrices A,, A‘, and A/’ must satisfy the 
same or complementary conditions, for example, as in 
| the following. 


Perfect Channel: 


(Nowra 9 Vaca ; ‘| 
01 
or 
Tee te pear | (30) 


or any combinations of A,, Aj, and A’ which satisfy 


either of the two conditions. 


(On| PA Ge, | OF 
Q» Qu + V1 Qs, = (1 et, 71) 
Q:; Qin 32 
Q ts Q, ge Qu Sa 
Qs Or. Qs 
Qs Qie ae (1 < ee) 21 =e }; 
|) Oe uy 
LQs u Qis LQ3o 
F hae: 
Y2 
+(-W a 
Cea 
L ores 
[ [3/7] 
72" 
ob }, LV32 J 
(Cee) 
L LQ32' J 


Zero-Capacity Channel: 
|| The doublet matrix has identical rows and its 4 X 4 
) determinant 


| A] =0. (31) 
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EXTENSION TO THE TRIPLET 


The generalization from doublet to triplet transmission 
is relatively simple. Four noise probability matrices A4{, 
Aj’, Aj’’, and Aj’’’ should be added to represent the 
property of the channel transmission for the third digit 
following the four possible doublets. The over-all noise 
probability matrix for the triplet can be composed by the 
pyramid structure of these matrices and the previously 
defined matrices in a way similar to (13). 


a | (32) 
Ake 


Aga 

During the multiplication process, to be carried from the 
right to the left, the submatrices in the left should be first 
considered merely as elements and then expanded. 

Through postmultiplication of the inverted matrix with 
the negative of the noise entropy matrix, the auxiliary 
matrix Q is obtained again. The latter is expanded in a 
detailed manner which is directly applicable for the 
computation of C. This is shown as (33). 


A 


(oa [ Oza Raye 
v5 + (1 — ¥3) 
ir spy Oba 950 
uy Oe Nou 
(1 os 53) ain 05 
Lass, L wa Las 
a] ih O| 4 = yp @ 
FRE, Qo ss 
1 
’ [Q: ) [ ~rr) 
22 el oe 63) Si 2 5 31 
Laz: Obs Lau 
ee 
Lay 
bee | 
4 gi OR” 
(AEE 
j 32 2, (33) 
+ = 4) 
Lou?) 
4 gr| Sat 
ye 


Despite the formidable appearance of this matrix it 
can be set up by simple rules as shown in the next section. 
Most important, all the Q;;’s can be read from Fig. 3 or 
Table I. This results in a considerable reduction in 
computation as opposed to 8 X 8 matrix operations or 
maximization processes with several constraints. 
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ITERATIVE PROCEDURE FOR THE N-PLET 

To extend the method to one which is applicable to 
blocks of n digits, it is desirable to adopt an iterative 
procedure. The procedure is best described by letting q, 
d2) Is) °° * » J, be the notation of Q for single digits, doublets, 
triplets ---  plets, respectively. Superscripts, 0 or 1, will 
be used to differentiate whether the single digit, doublet, 
etc., is preceded by “0” or “1.” Thus q? is the Q of a 
single digit preceded by “‘0,’’ and g; is the Q of a doublet 
] 

Qi: 
where Q,, is used twice will be simplified to read [Q,,] (2). 
The same applies to the matrices of Q,., and other paren- 
thetical subscripts representing the number of iterations. 

With this notation at hand, it can be shown that the 
equations for q;, q2, G3, °° *, Yn can be expressed as 


preceded by “1.” Also, the notation for the matrix 


soit ae (34a) 
Qi. 

ae eg 12) “||| (34b) 
[Qi2] qi 

a = bree (4) a] (340) 
[Q,2](4 @ 

mote Pare is a] ]] (34a) 
[Q12](2""") on 


Thus the calculation of g, must be preceded by the 
determination of the following number of q’s: 


q's 
Dee o's 
gee q3 ’g 
Z Qn-1 ) 


The last two g,-; matrices are then substituted into (34d) 
to obtain q,. The latter is a column matrix with 2” rows. 

It can be easily verified, for example, that this pro- 
cedure will yield the form of q, as given by (17) for the 
doublet and that of g; as given by (83) for the triplet. 

Properties similar to those discussed under the section 
on the doublet can also be extended and proved for the 
general case of the n plet. 


PROBABILITIES FOR THE RECEIVED AND 
TRANSMITTED SYMBOLS 


The channel capacity can be fully realized only if the 
symbols, here referred to as the n plets, are transmitted 
and received with appropriate probabilities. According 
to Muroga,’ the probabilities of the received symbols for 
the capacity realization are given by the following column 
matrix: 


PO) peo aN, (35) 
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It is evident that the elements of P(y) are nonnegative 
and add up to unity. 

Recalling the fact that the noise probability matrix A 
contains as elements the conditional probabilities p(y; | x:), 
the probabilities of the transmitted symbol corresponding 

o (34) are given by the row matrix 


P(@) = PY) A. (36) 
en et eee 


A numerical example is given by assigning to a’s and 
@’s the same values as in (20). The use of (85) and (36) 
gives the results: 


| 0.227043 | | 0.277784 | 
0.246224 0.255454 
Pty)] = and P(x)] = (37) 
0) ,235200 0.172266 
| 0.291533 _| | 0.294495 __ 


where the four rows are arranged according to the order 
of the doublets 00, 01, 10, and 11, respectively. The sum 
of the elements of P(y) and that of P(x) are both equal 
to unity. 

As pointed out by Muroga, some of the probabilities of 
the transmitted symbols, in certain conditions not ex- 
plicitly known, may turn out to be negative and thus are 
not physically possible. It then is necessary to suppress 
some of the transmitted symbols for capacity calculation 
and realization. However, in the ABC (wnm) it has been 
demonstrated by Silverman* that these required prob- 
abilities are always positive and lie in the range 1/e to 
1 — 1/e. In the case of doublets, if Af = Ai’ = A, it can 
be shown that this property is again satisfied by each 
digit. Actually the capacity under this condition is given 
by C = C, + C, as derived previously. The two digits in 
such a doublet are transmitted independently and the 
required probability for each doublet can be shown to be 
the product of the probabilities of the first and second 
digit when coded to realize the capacities C, and C,, 
respectively. There is no guarantee of the nonnegativeness 
of the elements of P(x) in the general case of ABC (wfm). 
A more complicated procedure of capacity computation 
will have to be used under adverse conditions. 


Discussion 


It has been shown in this paper that when equal blocks 
of binary digits are used to transmit or store information 
through an asymmetrical channel with finite memory, 
the capacity of this channel can be calculated relatively 
easily by the use of curves or a table. The need for the 
consideration of channel memory arises from such phe- 
nomena as multipath transmission and interdigit inter- 
ference due to limited bandwidth-time product of the 
transmitting and receiving equipment and of the trans- 
mission medium. It may arise also when the successive 
digits are intentionally assigned with different weights 
(weighted PCM). Since the channel capacity is the least 
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}upper bound of the rate of transmission, it serves as an 
portant criterion by means of which the performance 


ymeasured. The determination of the noise probabilities 
jfequired in the capacity calculation of various channels 
forms an important subject in itself. The design of 
jappropriate coding schemes to match with the asym- 
etrical channel with finite memory will be an intriguing 
yproblem for mathematicians and communication engineers. 
}} Some calculations have been made recently on a hypo- 
|thetical channel using the present method, which reveal 
jrertain interesting relations between the effects of the 
| signal-to-noise ratio and the bandwidth (upon which the 


Summary—tThis paper considers the fourth product moment, 
u(ti, Te, Ts) = Elx(t)x(t + ti)x(t + t2)x(f + %s)], when x(t) is 
} nfinitely clipped noise with a mean value of zero. If the noise is 
[saussian before clipping, the moment w is not obtainable in closed 
orm. For this reason, the Gaussian assumption is withdrawn and 
bother assumptions are employed. If the zeros of x(t) obey the 
‘Poisson distribution, a particularly simple result follows for w and 
i er all higher moments. An alternative assumption is the following. 
(Let unspecified events occur at times f, fi, &, -*: according to 
tthe Poisson distribution. If alternate events, i.e., those at th, bs, 
its, +++ , are designated as the zeros of x(t), both the autocorrela- 
: ion function and w(t, t:, t3;) can be derived. The results are in 

erms of elementary functions. A comparison is made between 
hese models and clipped Gaussian processes. 


INTRODUCTION 


ONSIDER a random process &(t) which is both 
stationary and ergodic. The mean value [é(¢)] 
is zero and the process is symmetric about the 


Let z(t) denote the output after the process é(¢) is 
‘infinitely clipped; 7.e., let 

Ose agit eels) 0; 
Me fi (). 


I 


(1) 


I 


Sil 


Now if the input &(t) is assumed to be Gaussian, the 
‘autocorrelation function of the output «z(¢), namely 
Ir(r) = E[x(t)a(t + 7)], is derivable from the normalized 
‘autocorrelation function of &(t) by a well-known arcsine 
formula.’ On the other hand, for the fourth product 
‘moment, 


* Manuscript received by the PGIT, June 2, 1958. 

+ School of Elec. Eng., Purdue University, Lafayette, Ind. 

1See, for example, J. L. Lawson and G. E. Uhlenbeck, ‘““Thres- 
‘hold Signals,’’ McGraw-Hill Book Co., Inc., New York, N. Y., p. 
58; 1950. 
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memory depends) on the capacity of the channel. It is 
hoped to publish these results later. 
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J. A. McFADDENf 


W(r1, 72, 73) = Healt + r)alt + r2)a(t + 73)], (2) 


no such simple formula is known. 

The moment w(71, 72, 73) is often needed in problems 
concerning clipped noise. For example, if the autocorre- 
lation function r(7) is to be estimated by means of a time 
average over a finite time, the variance of this estimator 
involves a double indefinite integral of w(7,, 72, 73) with 
respect to the arguments.” 

The moment w(7,, 72, 73) is closely related to the 
quadrivariate normal integral, which is the probability 
that four random variables which are jointly Gaussian 
are simultaneously positive.*’* That integral has not been 
solved in closed form except for special numerical values 
of the correlation matrix. Special solutions are not usually 
sufficient, since the solution for w(7, 72, 73) must be 
known for all values of the 7; so that it can be integrated 
with respect to the 7;. 

For the same reason, numerical evaluation of the 
quadrivariate normal integral is not of much use in noise 
problems of the type mentioned above. 

What is really needed is a simple expression containing 
easily integrable functions of 7,, 72, and 73. Such a solution 
is apparently not available under the given assumptions. 


2 J. H. Laning, Jr., and R. H. Battin, ‘Random Processes in 
Automatic Control,’? McGraw-Hill Book Co., Inc., New York, 
Nis Meg 10s GIs MO. 

3 J. A. McFadden, ‘The axis-crossing intervals of random func- 
tions II,’”’ IRE Trans. on Inrormation TuEory, vol. IT-4, pp. 
14-24; March, 1958. See (84). : 

4 J. A. McFadden, ‘Urn models of correlation and a comparison 
with the multivariate normal integral,’ Ann. Math. Stat., vol. 26, 
pp. 478-489; September, 1955. See sec. 6 and bibliography. : 

Also, J. A. McFadden, “An approximation for the symmetric, 
quadrivariate normal integral,’’ Biometrika, vol. 48, pp. 206-207; 
June, 1956 
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[In fact, even the correlation function r(7) is not a par- 
ticularly simple function of 7, 7.e., one that can be inte- 
grated in closed form.] 

What then can be done in a practical problem in which 
the moment w(7,, 72, 73) 18 needed? The method of this 
paper is to remove the assumption that &(¢) is Gaussian 
and to prescribe instead the properties of the axis crossings 
of &(t) [which are also the axis crossings of «(¢)]. 

Two different assumptions are discussed: 

a) The zeros obey the Poisson distribution. This is one 
of Rice’s “random telegraphic signals.” 

b) A random sequence of events occurs at various 
instants according to the Poisson distribution, and 
alternate events are designated as the zeros of x(t). 


DERIVATION 


The results for w(7, 72, 73) are now derived. 

Consider a random sequence of time intervals which 
end at the instants fo, t, tz, --: . The number of these 
instants occurring between the times ¢ and t + 7 obeys 
the Poisson distribution; that is, the probability that this 
number is exactly n is 


hin, 7) = 2 Caos (3) 
where a is the expected number of these instants occurring 
in a unit interval of time. 

Let the function a(t) assume the constant value a, over 
the interval #,., < t < #,, where n = 1, 2, 3, --- . Hach 
a, i8 a random variable which assumes the value + 1 or 
— 1 with probability one half. The successive values 
OOo are not independent.” The form of 
dependence is prescribed later. 

Consider now the product x(f)a(t + r)a(t + 72)a 
(t + 73), where 0 < 7, < 72 < 73. If ¢ falls between ¢,_, 
and ¢,, then w(t) = a,. Let the number of end-points of 
intervals between ¢t and ¢t + 7, bel. Let the number between 
t + 7, and ¢ + 72 be m, and let the number between 
t+ 7, and¢-+ 7; ben. Then the fourfold product is 


x(ta(t ae Txt i T>) a(t ae 73) = AA +1Ae+14+mAx+1limtin: (4) 


Then the expected value of this product is the product 
of the a’s times the joint probability that exactly 1, m, 
and n intervals end, respectively, between ¢ and ¢ + 7, 
between ¢ + 7, and ¢t + 72, and between ¢ + 7, and ¢ + 73, 
summed over all the values of /, m, and n, and averaged 
over k. But since the instants ¢t; occur according to the 
Poisson distribution, it follows that the numbers /, m, and 
n are statistically independent. Then w(r,, 72, 73), the 
expected value of (4), is 


5§. O. Rice, ‘Mathematical analysis of random noise,”’ Bell Sys. 
Tech. J., vol. 23, pp. 282-332; July, 1944, and vol. 24, pp. 46-156; 
January, 1945. See sec. 2.7. 

6 Cf. H. M. James, N. B. Nichols, and R. S. Phillips, ‘“Theory of 
Servomechanisms,’”’ McGraw-Hill Book Co., Inc., New York, 
N. Y., p. 301; 1947, where the a, are independent. The present 
method is a generalization of the derivation on pp. 301-303. 
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wn, T2) T3) aes > > > Skimn 


l=0 m=0 n=0 (5) 
x Al, ry) h(m, t2 — 1)h(n, 73 — rah, 


where / denotes the average over k and where 


Skimn = AAp+1Ak+1+mAUk+l+mine (6) 


The next step is to specify S12, for particular models. 
Under assumption a), the zeros of x(t) obey the Poisson 
distribution. This requirement can easily be imposed on 
(5) if it is prescribed that 


— aj; (7) 


that is, the sign of «(t) must change at each of the instants 


Op, 


los tite ee ee 
Chan = (—1)'a,, 
Ak+ttm = (—1)"*"a,, (8) 
Onsismen = (=)? ag; 
therefore 
Stir = lo ed 
(9) 


=(-)". 


Since this result is independent of k, it need not be 
averaged over k. Then the triple sum (5) is easily evaluated; 
the result is’ 


a ee aD) (10) 
Note that w(7,, 72, 73) 1s completely specified if the first 
and third time differences, 7; and (tr; — 72), are known; 
the second one, (rz — 7,), is not needed. 

Now for this model the autocorrelation function is” 


W(T1, T2, 73) = @ 


Tiere (11) 
therefore (10) can be written 
W(r1, T2, Ts) = Y(7)r(73 — 72). (12) 


This result can be extended to the higher moments of 
even order. Let x, = x(t) and let v;,., = a(t + 7;), where 
a= 1, 2,3, --- and 7;4, > 7,. It has been shown in (12) 
that 


EB eie,0 0s) = eae (13) 


It follows directly that the products 2,7, and xv, are 
uncorrelated. But since x,7. and 2327, can each assume 
only two values, +1 and —i, the products must be 
independent.” In this case it can be proved also that 
all the successive products 21%, %3%4, UsXe, +++ are mutually 
independent; therefore® 


7Cf. T. A. Magness, “Spectral response of a quadratic device,” 
J. Appl. Phys., vol. 25, pp. 1857-1365; November, 1954. Eq. (57). 
*W. Feller, “An Introduction to Probability Theory and Its 
Applications,” John Wiley and Sons, Inc., New York, N. Y., vol. 
1, Ist ed., p. 189; 1950. See problem 21, 
_* This result was first shown to the author by G. R. Cooper in 
his memorandum, “System measurement with binary noise,’’ (to be 
published). 


(ep Te Dean) = EG as) ao IO Gi pores) (14) 


=, BSP etoa 


—2a(Ten-1-Tan—2) 


The fourth moment is now derived under assump- 
tion b). In this case, alternate instants t,, fs, t;, --- are 
Hesignated as the zeros of the process a(t). In other words, 


Pi aI) (15) 
| (at 
IWvhere 7 = 1, 2, 3, --- . Then there are three cases to be 
iconsidered. The results are as follows. 
1) If and n are both even, 
Va, (16) 
2) If J and n are both odd, then 
See os es (17) 
3) If lis odd and n is even, then 
See (msg en (18) 


|But since k can be either even or odd, the average over 
/k is zero. The same is true for the case in which / is even 
and n is odd. 

| Substitution of the above results into (5) yields the 
ifollowing: 


hw, T2) 73) 


| a > ss Si Ene ee 7)h(m, i ™m)nn, Le oe) 
a | iL ™m nm 


- even even 


a S Ds ye CSGr'eee 


odd odd 
x Ail, r)h(m, t2 — h(n, 73 — 72) 


att 


ee cog QT, COS alr; as T2) 


PROM Cry — 051) 


(19) 


sin at, sin a(t3 — 72) |. 


The autocorrelation function of this process may be 
‘obtained as a special case. If 7; — 72, then w(m1, 72, 73) 
| becomes 


Weg res Ge l= Bed — 7). 
| With 73 = T, and 7; = 7, (19) becomes 


—a|r| 


Ge) =a COS aT. (20) 


| The method used in obtaining (19) may be extended 
to the case of higher moments of even order, but with 
increasing complexity. 

| 


| 


DISCUSSION 


Of what value are the above results when the actual 
problem involves infinitely clipped Gaussian noise? 

Consider a Gaussian process &(t) which has the normal- 
ized autocorrelation function {7.e., H[E(t)E(t + 7)]| divided 
by the variance}, 
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a) = sin | Er(0 |, (21) 
where r(r) is given by (11). (It is not clear whether such 
a correlation function is possible; it remains to be proved 
that the corresponding spectral density is a nonnegative 
function.) Provided such a Gaussian process exists, the 
autocorrelation function r(7) [after the clipping process 
(1)] is identical with (11), because of the arcsine formula.’ 

However, the clipped Gaussian process cannot be 
identical with the Poisson model a). It can be shown,’?’™ 
for certain types of correlation functions, that the suc- 
cessive axis-crossing intervals of a Gaussian process 
cannot be statistically independent. This argument 
makes the Poisson model incompatible with the Gaussian 
assumption. 

Evidently such a clipped Gaussian process and the 
Poisson model have the same autocorrelation function 
(11) but are not the same process; therefore, some of their 
higher-order moments must differ. It remains to be 
shown how well the fourth product moment (10) compares 
with the w(7, 72, 73) for the corresponding clipped Gaussian 
process. 

As stated earlier, w(71, 72, 73) has not been obtained in 
closed form for a clipped Gaussian process. However, 
there is one limiting case which can be used for com- 
parison. This limiting case occurs when 0 < 7, < 72 < 73 
and the variables 7, and 7; — 7, become infinitesimal. 

It has been shown,” for a symmetric process (not 
necessarily Gaussian), that the limiting form of the 
fourth product moment after clipping is simply related 
to the function U(r), where U(r)dr is the conditional 
probability that a zero will occur between ¢ + 7 and 
t + 7 + dr, given a zero at time t. The relation is the 
following: 


a) ) 
OT, O73 O+,7,7+ 


un = 4 (22) 


48 


where 6 is the expected number of axis crossings per 
unit time. 

Now U(r) is known both for the Poisson model a) and 
for a Gaussian process. For the Poisson model, the 
moment (10) may be substituted into (22), with 6 = a, 
and the result is a constant. 


ONG2) Se: (23) 
for all r > 0. This result checks the basic Poisson assump- 
tion, namely that the probability of a zero occurring 
between ¢ and t + dt is adt, independent of the location 
of all previous zeros. 

For a Gaussian process U(r)dr has been calculated by 
Rice.’* If the autocorrelation function (21) is substituted 
into Rice’s formula, using (11) for r(z), the result is some- 


10 McFadden, footnote 3, p. 17. 

1D. S. Palmer, ‘Properties of random functions,” Proc. Camb. 
Phil. Soc., vol. 52, pp. 672-686; October, 1956. See pp. 679-680. 

122 McFadden, footnote 3, (11). 

13 Rice, op. cit., (3.4-10). 
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what different from (23). U(r) is a monotonically de- 
creasing function, beginning’ with U(0) = 1.436a and 
approaching the value a as 7 > o. The decay is roughly 
exponential, with a time constant close to 1/(4q@), 2.e., 
one quarter of the average time between axis crossings. 

It may be concluded that the Poisson model a) is not 
a very good approximation to a clipped Gaussian process; 
yet (10) is still recommended for the fourth product 
moment because of its simplicity and the absence of more 
suitable models. It should be tried whenever the auto- 
correlation function p(7) of the original Gaussian process 
&(t) has a behavior somewhat like (21), where r(7) is given 
by (11). @ should be chosen equal to 8, the expected 
number of zeros per unit time.” 

A similar argument may be applied to the alternate 
Poisson model b). Consider a Gaussian process with an 
autocorrelation function (21), where now r(r) is given by 
(20). Provided this process exists, 1t may be compared, 
after clipping, with the alternate Poisson model. 

As before, the two processes cannot be identical. 
Although the previous arguments’’’” do not apply to 
this particular p(7r), a slight extension shows that the 
successive axis-crossing intervals of this Gaussian process 
must also be statistically dependent. This argument 
makes the alternate Poisson model incompatible with the 
Gaussian assumption. 

U(r) may be calculated for the alternate Poisson model 
by substituting the moment (19) into (22) with B = a/2. 
The result is the following:'® 

U(r) = 5 —e**) (24) 
for allz > 0. 

For the corresponding Gaussian process, the function 
U(r) is quite similar to (24). It begins at r = O with 
exactly the same initial value, slope, and curvature as 


14 McFadden, footnote 3, (24). 

16 Rice, op. cit., (3.3-11). 

16 Cf. M.S. Bartlett, “An Introduction to Stochastic Process,” 
Cambridge University Press, Cambridge, Eng., p. 167; 1955. 


IRE TRANSACTIONS ON INFORMATION THEORY 


December 


(24), then rises slightly above (24), and approaches the 
asymptotic value a/2 while undergoing small oscillations. 

The close agreement of U(r) provides only a partial 
check on (19) for the fourth product moment. Again, 
however, the result is recommended because of its 
simplicity. It should be used when the autocorrelation 
function p(r) of the original Gaussian process has a 
behavior somewhat like (21), where r(7) is given by (20). 
a should be chosen equal to 28, where 8 is the expected 
number of zeros per unit time. 

An interesting problem for further research would be 
to devise other comparisons which might prove how 
closely the alternate Poisson model b) approximates a 
clipped Gaussian process. 

One method of comparison is suggested by some recent 
work of Longuet-Higgins,"’ who has obtained certain 
derivatives of the quadrivariate normal integral. The 
relation to the fourth product moment after clipping has 
been described previously.* In terms of w(71, 72, 7s), 
Longuet-Higgins’ results would provide the following. 

Let the observations on x(t) be performed at equal 
intervals. That is, let 7, = 473 and r2 = 3 73. Furthermore, 
let.t; = t.t-=.t + 1; ty =4 + re, andi 90 ee dee 
these conditions, the results would provide expressions for 


Ot; Oty 


in terms of the autocorrelation function p(r). 

Unfortunately these results are extremely complicated; 
yet if they were obtained numerically for the cases in 
question they would provide another comparison of 
clipped Gaussian noise with the models a) and b). 

Another possibility for a comparison would be to 
evaluate the quadrivariate normal integral numerically 
for certain values of 7,, 72, and 73, using the method 
given by Plackett.** 


17M. S. Longuet-Higgins, ‘On the intervals between sucessive 
zeros of a random function,’ Proc. Roy. Soc., London, vol. A246, 
pp. 99-118; July, 1958. 

1 R. L. Plackett, ““A reduction formula for multivariate normal 
integrals,’ Biometrika, vol. 41, pp. 351-360; December, 1954. 


Summary—It is shown that the envelope of a narrow-band 
Gaussian noise constitutes a first-order Markoff process if the 
power spectrum of the noise is the same as would be obtained 
jfrom a singly tuned RLC filter with white noise at the input. 


INTRODUCTION 


= = 


N his famous paper on random noise, Rice’ derived 
the joint probability density function for two 
| samples from the envelope of a narrow-band Gaussian 
moise. The problem of extending the solution to N samples 
of the envelope is very difficult. Kac and Siegert” and 
tHoffman*® gave general solutions, although not in closed 
tform. Since these solutions place no restrictions on the 
power spectrum of the noise, the results are intractable 
ito most further manipulations. 

| If we restrict our attention to a power spectrum of the 
iform 


iH 


1 
OO = GR) EP 


\with f the frequency, f, the center frequency, and F 
}proportional to bandwidth, it is possible to arrive at a 
‘simple joint density function for N samples of the envelope. 
}Such a spectrum is approximately that of the output of 
ka high-Q singly tuned RLC filter with white Gaussian 
noise at the input. 

|| In addition, it can be shown that the envelope of noise 
with this power spectrum constitutes a first-order Markoff 
process and is completely specified by the joint density 
function for two samples. The author shows this by the 
same method as used by Wang and Uhlenbeck,* that is, 
by showing that the conditional density function of the 
Hee envelope sample, given the (NV — 1) preceding 
Lenvelope samples, is identical with the conditional density 
Hetttion given the immediately preceding sample. To 
reduce notational difficulties, however, a full derivation 
iis given for only three envelope samples, with modifi- 
cations of the derivation to extend it to N samples. In 
ithe following, the term “Markoff” will be taken to mean 
| first-order Markoff. 


i 


* Manuscript received by the PGIT, July 16, 1958; revised 
manuscript received, August 6, 1958. 
| Air Res. and Dev. Command, AF Cambridge Res. Center, 

Bedford, Mass. 

| 28.0. Rice, ‘“Mathematical analysis of random noise,” Bell Sys. 
| Tech. J., vol. D4, pp. 75-79; January, 1945. 
| 2M. Kae and A. J. F. Siegert, “On the theory of noise in radio 
‘receivers with square law detectors,” J. Appl. Phys., vol. 18, pp. 
| 396-397; April, 1947. rien 8% : 
| #w.c. Hoffman, “The joint distributions of n successive out- 
puts of a linear detector,” J. Appl. Phys., vol. 25, pp. 1006-1007; 
| August, 1954. 


4M. C. Wang and G. E. Uhlenbeck, “On the theory of Brownian 
| motion II,”’ Rev. Mod. Phys., vol. 17, pp. 323-342; April-July, 1945. 
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Joint Density FuNcTION ror THREE SAMPLES 


Adhering closely to Rice’s notation in the following 
derivation, let w(f) be the power spectrum of a narrow- 
band Gaussian noise, and let f,, be a representative 
midband frequency. Consider the three functions 


Lf) = 1), 
I,(t) — It st 73) 5 (1) 
I,(t) = Te ie 1 ae Ta) 


where J(¢) is the time function having the given power 
spectrum, 7,,2 is the delay between the first and second 
functions, and 72,3 is the delay between the second and 
third. We can represent each function as being the sum 
of cosine and sine (in-phase and quadrature) components 
at the midband frequency: 


is = Shee COS Wml =; Ihe sin Broth 


I, = I¢2 COS Wnt — Igo SM wml, (2) 
Ib = lie cos Wmt = iss sin Goin 

where wn, = 2f,, and Ig,, --- , Ig; vary at a rate slow 

compared with the midband frequency. For each of the 


three functions, we define the envelope R of the narrow- 
band function by 


Sales Se 

Tp =k cos: 6, (3) 
oe Ae snip 

The six components, which we order in a row matrix, 


X = Iq, Too, Les, Is1, Is2, Iss; (4) 
== all feats 23 E | 


constitute a six-dimensional normal process with a 
moment matrix, 


Yo Mi,2 M1,3 0 Mie) Fil} 
1,2 Yo [eV 0 V2 3 
[M] — | ens 2,3 Yo PAL ¥o,3 O (5) 
0 eel oe eel 3 Yo M1,2 Ki,3 
Ai 0 Lee My ,2 Yo 2,3 
LV},3 Ay 85 0 1,3 Me,3 Wo — 


From Rice’s equation pair (3.7-11) and its preceding 
material, the coefficients in the moment matrix are 
given by 
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f) df, 
cos [2r(f — fm)T1,2] df, 


“he 
i. 


w(f) cos [2r(f — fn) r2,3] df, 


ae w(f) cos [2r(f — fn)(m1.2 + 


JQ 


T2,3)| af, (6) 


Mi .3 


l| 


Py 2 


[ w() sin Ba =f.) al af, 


v0 


and 


tee iE w(f) sin [2r(f — fin) 2,3] df, 


aes). a 


i w(f) sin [2r(f — fn)(t1,2 + 72,3)] df. 


We now choose the power spectrum 


k 
a + 16x°(f — 


wf) = i? (7) 


where a is the half-power bandwidth in radians per 
second, and k is an arbitrary constant. Then 


eye df ‘ 
Yo = i a ne 16x°(f ar ae 


: kdf 
jp, @ + 16n°f? 


ke iE eon 
4 4dtra —2Q9fm/a ire + x ; 


es i. k COS (aie ead df 
EP So a 16a Gea) 


Bi ie cos [(a/2) rx] dx 
Sheyt ae i) eae Y 


“ k sin [2a(f — fn) 7] df 
o a + 16m (f — fn)? 


k ie sin [(a@/2) ra] da 
—27fm/a 


Atra . 


~ 4 Pix 


dra . 


As 2rf,,/a (the center frequency divided by the band- 
width) becomes large, 


k 
Yas ; we a) exp (—ar/2), y— 0. (8) 


For simplicity in the final result, one writes 


Oy» = exp (—a7,5), bo.3 = EXD)(—aiaa), (9) 


so that 


betas Wow Ova; Me,.3 = VaN ban, Lit = aN Ogee: 


Since all the vs are zero, one is led to the obvious par- 
titioning of the moment matrix for conciseness: 
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[M] = vf a | | (10a) 
O71 A 
where 
ry V Das W bebe 
[Al Va oe \/ Dies (10b) 


SS 


in baabenae WA0, eel 


and, of course, the zero matrix is also 3 X 3. 

The inverse and determinant of the moment matrix 
are also needed for the determination of the joint density 
function. These are given by 


ety = 1 bee IM |= vi) 4), Gish 
Yel enone! 
where 
re ; eee ; Bs 
(1 ioe by 2) el 7 by 2) 
4-1 = Ne i BD, abo, sea A Doe 
eels CRS sb Ce re Gl Ane) (Tb) 
0) a Vie 1 
Le, (1 ae bs,3) (1 ~~ be .3)_| 
and 
A) =a (ig 


It should be pointed out at this juncture that the simple 
solution obtainable for the joint density function of the 
envelope samples is a result of the zero elements in the 
upper right and lower left corners of the inverse matrix 
A™*. More precisely, the simple solution requires that 
only three diagonals contain nonzero elements. 

With the row matrix X from (4) representing the six 


variates, the six-dimensional normal density function 
for the variates is 


M |X), (12) 


1 

‘ pve 
where X, is the transpose, or column, matrix. Rather 
than write out this density function, one immediately 
makes the substitution indicated by (8), Jc = R cos 6, 
I; = R sin 6. The differential element goes into RdRdé. 
After suitable trigonometric manipulations, the joint 
density function of the three Rs and three 6s becomes 


RRR 
(27)°(1 a b: 2) od oe 


Rae = Ry as dd = Dies Uots ie 
exp | 2(1 = bi 2) Wo 2(1 o by 9)(1 rex bs,3) Wo 


Re | | Se aak cos (6; — 62) 
exp 


p(R,, 6:, Pz, 02, R3, 03) = we 


= 2(1 a bo 3) Wo al - by 2) Wo 
Vb, ,R>R; cos (6. — zy 
T ( a bo 3) Wo U3) 
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Now, 


phi, Rs, Rs) 

| 27 

r= Ii Dik. 61, io 62, saiesy 63) dé, db. dO. (14) 
0 


If one temporarily replaces the coefficients of cos (6, — 62) 
and cos (#2 — 63), in the exponent, by d,,. and d,,s, 
respectively, the pertinent part of the integration is then 


il 20 1 2 1 
Sat d0; ap d6, exp [dz,3 cos (62 — 43)] S 


27 


dé, exp [d,,. cos (0, — 6.)] 


1 
on I Stes} 7 dx expilds. cos 2] On 


2r7—-O03—-x 
-[ dy exp d,.. cos y, 
=, —2 
«x and y being dummy variables of integration. The 
periodicity of the two inner integrands permits one to 


change the limits of integration to (0, 27), so that the 
| expression becomes 


| i 2m 1 24 1 
| Di i| aes Qa it dx exp [ds,3 cos x] = 


fo dy exp [d;,» cos y] 
0) 


= I(d.,2)Lo(di as 


The 3-sample density function is, therefore, 


| p(F,, Rs, R3) 
fe RRR ( Vb RRs ) : ( V bo sok, ) 
Tie Ta b, 2) =, Uw c Yo e ne) : Yo a be, 3) 
| Ri (eto ke 
Cg hy 2 Od — bs 
Es 
apo 5 | ue 


GENERALIZATION TO NV SAMPLES 


Since the general method of solution for the N-sample 
case corresponds closely to that for three samples, only 
the points of divergence are given here. Eqs. (1)-(9) still 
apply after inclusion of all NV variables in each equation. 
Again the matrices are written in the form 


ea eA. lee 


where C = A™*. For conciseness only the general terms 
of [A] and [C] are given: 
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Ohne == \/ (hen COaaeae) pack (One ee) if m <= nN, 
Tian 
if m > Mics (16) 


= An, m 


with 


Ds 541 = gv (50r e) 


CoO dy nl ele 
= as =a Vibe 
Crem ams Cm,m+1 ia 1 = b 5) 
m,m+1 
Wi, 
ie . igo (17) 
i {= by 2 ‘ an = by-1,n ” 


1 ee ORE Oar 
(1 oe Oe Gas Oe) 


mt Yn S= I Ow WN. 


Cm,m = 


emaGhtetm (18) 


Substituting these matrices in (12) leads to a very 
involved 2N- dimensional density function in the several 
Rs and 6s. The multiple integral over the 6s can fortu- 
nately be performed by the same technique as before—by 
substituting dummy variables and changing the limits 
of integration. The end result is the joint density function 
for N successive samples of the envelope: 


p(k, , Rp, = , Ry) 


Re, ee N/ On talon 
= TI 
i Wo(1 mt (jpaeeee 


0 m=1 


= [ Ri cs ee 
| 20 = been eae 


bal 1 cS ( a Oder Uae ies | 
2Wo n=2 el i (eee) = Dire) ‘ 


(19) 


CoNnDITIONAL Densiry FuNCcTION 


The probability density of the Nth envelope sample, 
given the previous N-1 samples of the envelope, may be 
written as 


p(Ry | R,, R., Ot) Ry-1) 
fe p(R,, Rs, ee aad) Rw-1; Ry) 
7 p(k, , Rs, ee) Ry-1) : a) 
or 
= Ry 
@ = by-1,~)¥o 


i B=: = ; | beara | 1) 
ee Se | eh tran os 


Eq. (21) was obtained by twice substituting (19) into 
(20). Rice’s (3.7-10) and (8.7-13), modified to fit the 
notation used here, read 
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Tee : bee I 
p(Ry-1) = Te ne (- QW ) 
: _ RyRy 
pik v1, Ry) = qd — Ota) We 


4 mye SS Ry ie Rx | | Nave afiecile 
ep| 2(1 oa by-1,~) Wo fo (1 = by-1,y) Wo : (22) 


and since the conditional density of the Nth sample, given 
the (NV — 1)th, is 


p(Ry-1, Ry) 
p(Ry-1) 


p(Ry | Ry) = 


we find by simple division that 


p(Ry | Ry-1) = p(Ry | Vipin oe , Ry-1), (23) 


which is a necessary condition for the envelope to be a 
Markoff process. The other necessary conditions, as given 
in Wang and Uhlenbeck, are that 


p(Ry | Ry-1) = 0, 


Fea then wee 


p(Ry) = (r p(Ry | Ry-1)p(Ry-1) dRy-, 


In Rice’s work, it is seen that the conditions listed in (24) 
are satisfied. 
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REMARKS 


It has been shown that if a Gaussian noise has a 
power spectrum such as obtained from a single-tuned 
narrow-band RLC filter with white noise at the input, 
the envelope constitutes a Markoff process. It should 
be mentioned that not only the envelope, but all the 
one-to-one zero-memory functions of this envelope will 
also be Markoff. In particular, if one defines the instan- 
taneous power of the noise as one half the square of the 
instantaneous envelope, this is also Markoff. We have 
thus arrived at a new family of Markoff processes. 
Although no simpler proof is apparent, it is not suprising 
that the envelope of the noise with the given power 
spectrum should be Markoff. For the correlation matrix 
that results from this spectrum, the inphase and 
quadrature components of the narrow-band noise are 
independent Gaussian variates, and by Doob’s” result 
these components themselves constitute identical Markoff 
processes. The envelope formed from the squares of these 
components might therefore be expected to be Markoff. 
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Prediction and Filtering for Random Parameter Systems’ 


F. J. BEUTLER 


Summary—This work generalizes the Wiener-Kolmogorov 
theory of optimum linear filtering and prediction of stationary 
random inputs. It is assumed here that signal and noise have 
passed through a random device before being available for filtering 
and prediction. A random device is a unit whose behavior depends 
on an unknown parameter for which an a priori probability dis- 
tribution is given. 

A number of engineering applications are cited. Two of these 
are worked out in some detail to illustrate the optimization pro- 
cedure. 


I. INTRODUCTION 


HE transfer function describing a linear time 
invariant system is often dependent on its en- 
vironment or its application. Since utilization 


* Manuscript received by the PGIT, June 4, 1958. This work 
was conducted in part by Project MICHIGAN under Dept. of 
the Army Contract DA-36-039 SC-52654, administered by the 
U. 8S. Army Signal Corps. 

+ Dept. of Aero, Eng., University of Michigan, Ann Arbor, Mich. 


of equipment is not always precisely known at the time 
of its design, the designer must take into account the 
ensemble of situations with which the system might be 
faced. The inevitable result is a compromise design, 
perhaps intuitively slanted toward those situations 
which the system will encounter most frequently. 

This paper presents a systematic technique of mean 
square optimum design for systems containing random 
components whose ensemble characteristics are known. 
It is assumed that the system to be optimized is presented 
with a (wide sense) stationary input which has been 
passed through a component with transfer function 
H(w, y); y is a random variable for which the distribution 
function is given. 

Before beginning the analysis, several examples are 
given where such a method is useful. As a first example, 
consider an amplitude-modulated ground-to-ground 


communications link. Here the ground reflections depend 


on the transmission distance, nature of terrain, height 
of antennas above ground, and the modulating frequency. 
The problem is to design a filter which removes the 


reflection components of the incoming signal, as well as 


receiver noise and ground clutter. Evidently, the designer 
does not know in advance the spectrum of the ground 


reflections or the signal power at the receiver, especially 


when the communications link is intended to be versatile. 
He can determine, however, the probability of its use in 
each possible mode and type of terrain. 

A second example is concerned with an antiaircraft 
gun in which the chief problem is one of predicting the 
position of the target over the time required for the shell 
to reach the proper altitude. This problem has been 
treated by Wiener. However, it may be assumed that 
random variations in air density and muzzle velocity of 
the shell affect the time of flight. Then the irue time of 
flight (and thus the prediction time) is unknown and 
can only be determined statistically. 

As a third application, suppose that a quantity is 
measured by a transducer located in an inaccessible or 
dangerous place. Although the transducer characteristics 


_ change with time, recalibration is impossible. Neverthe- 
less, the ensemble behavior of such transducers can be 


established through experimentation, and this information 
used to treat the transducer data optimally by the 


_ methods of this paper. 


II. DERIVATION OF THE OpTIMUM FILTER 


The filter problem under consideration perhaps is best 
_summarized by the flow diagram in Fig. 1. Assume for 
the moment that the noise n(¢t) = 0. The signal a(t) is 
' taken to be wide sense stationary and continuous in the 
_mean.’ The desired or ideal output, y(t), is obtained by 
subjecting x(t) to an ideal operator u(w), defined in the 
frequency (but not necessarily the time) domain. The 
operator k(w) is the optimum filter which is sought. It is 
restricted to being the transform of a physically realizable 
weighting function. This means that the ouput of the 
filter k(w) cannot anticipate the input or, what is equiva- 
lent, the weighting function to which k(w) is related must 
be zero for negative argument. 

A novel feature of the filter problem depicted by 
Fig. 1 is that the filter cannot act on the signal itself, but 
-on a randomly distorted version of the signal. This is the 
“effect of the random parameter system H(w, y) through 
‘which x(t) is passed before being available for filtering. 
An alternative formulation, of course, is that the optimum 
filter is always cascaded with H(w, y); that is, part of the 
‘filter itself is fixed in a random manner. 

_ Since the input 2(t, y) to the optimum filter has a 
spectrum which is dependent on y, the z process is (in 


1See J. L. Doob, ‘Stochastic Processes,” John Wiley and Sons, 
Inc., New York, N. Y., pp. 95, 518; 1953. These are minimum re- 
strictions for optimum filtering. 
__N. Wiener requires stronger conditions in “Extrapolation, Inter- 
| polation, and Smoothing of Stationary Time Series,” John Wiley 
and Sons, Inc., New York, N. Y.; 1949. 
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Fig. 1—Simplified random parameter filter problem. 


general) not ergodic, so that the entire optimization 
procedure must be expressed in terms of ensembles. This 
approach also leads to more general results; for instance, 
the ideal operator u(w) need not have a time domain 
interpretation. Dealing with ensemble (rather than time) 
averages also makes it easy to prove the existence and 
uniqueness of the optimum filter and to show that the 
minimum mean square error can always be approached 
as closely as desired by stable lumped parameter networks.” 
A signal a(t) can be represented by 


aH eee dX (w) 
Tv —o 


if x(t) is a wide sense stationary process which is con- 
tinuous in the mean.* Here X(w) is a process with orthogo- 
nal increments for which 


a(t) = (1) 


ae (eer = os i Bw) de (2) 
where ®(w), the spectral density of x(t), is assumed to 
exist. 

Now y(t) is the result of operating on x(t) with ideal 
operator u(w). The representation for y(t) is then 


ui) = Joe [e*ue) aX) @) 
in which any u(w) which satisfies 
iP ETE) HO) lei (4) 


is admissible.* The class of admissible w(w) includes 
prediction [7.e., u(w) = e°°* with a > Oj, differentiation 
lif [°.. w B(w) dw < @], as well as any u(w) which is the 
transform of an absolutely integrable weighting function 
U(t). In the latter case, 
1 ie om 
| eta Moye = i ua) UG — 9) de 
/ 9 —o —o 
with probability 1; this establishes the mathematical link 
between time and frequency domains. 
The signal z(t, y) into the optimum filter is expressed as 


(5) 


2F. J. Beutler, “A generalization of Wiener optimum filtering 
and prediction,”’ Doctoral dissertation, California Inst. Tech., Pasa- 
dena, Calif., pp. 19-20; 1957. 

3 Doob, op. cit., pp. 527-528. \ 

4 That is, u(w) must be such that y(t) has a finite mean square. 
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PAG GY) = | | e'° Hw, y) dX(w) (6) 


/ Qa J— 


which is analogous to (8). The optimum filter output 
w(t, y) represents a linear operation k(w) on z(t, y) so that 


hoe We "Hl ior dxiay cn) 
WT J—o@ 


Since the error e(t¢, y) is defined as y(t) — w(t, y), we have 
from (3) and (7): 


co 


eC) ee Tz shies e'*"[u(w) — Hl, y)k@)] dX). (8) 


The mean-square-error € is determined by averaging 
| e(t, y)| over the ensemble of signals x(t), and averaging 
also over y. These steps may be performed in succession. 
Thus, (8) is used to average respective to x(t), and to 
obtain the result 


e = H{E | e(t, y) iy 
yY «= 


= wf | uw) — Hw, yk) |? &@) is) (9) 


af 


Tor convenience in notation, expectations on y are written 
as 


EH, y] = [| Hl, dF) =H) — (10) 
and 
B| He, |= { | He, dro) =Ge) ay) 


in which F(y) is the distribution function for the random 
variable y. Expansion of (9) followed by a regrouping of 
terms then yields 


a iL Ko — 


+f 

The second integral in (12) is seen to be independent 
of any optimization of k(w). This term is always non- 
negative by the Schwarz inequality; in fact, | H(w) |? 
G(w) if and only if there is an H(w) such that H(w) = 
H(w, y) independently of y. In physical terms, a random 
transfer function element cascaded with the filter k(w) 
always leaves a residual mean square error. Because the 
integral does not depend on k(w), the residual error 
remains even if k(w) is not restricted to realizable filters. 
Consider now the first integral of (12). The optimization 
problem is solved if we find that k(w) which minimizes the 
integral. The method of solution proposed here places the 
minimization of the integral within the context of Wiener 
filtering and prediction. Suppose that a process r(t) has 
spectral density ®(w)G(w), and that the desired output 
of the system is obtained by subjecting r(¢) to the transfer 
function u(w)H(w)/G(w). The actual output is generated 


2 


u(a) H(e) G(w) Pw) dw 


G(w) 


ie | Hw) |? 
G(w) 


| | w(w) |? P@) dw. (12) 
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Fig. 2—Wiener problem having the same optimum filter as in Fig. 1. 


by passing r(é) through a physically realizable filter with 
transfer function k(w). These relationships are depicted 
in Fig. 2. The mean square error for this system is 


PGs [ u(c) H(w) : 


k@) — Glo) 
which corresponds precisely to the first integral in (12). 
Thus the problems are identical and the Wiener technique 
is applicable, if only @(w)G(w) is factorable. 

The question of the factorization of (w)G(w) is deferred 
for the moment; we assume that @(w) = y¥,(w) yw) and 
G(w) = G,()G,@). Here ¥, and G, are Fourier trans- 
forms of L, functions’ which are zero for negative argu- 
ment. The optimum Wiener filter for the problem of Fig. 
2 (and thus for the problem of Fig. 1) then is known to be 


Ow)G@) dw (13) 


1 
le) Sate) 
i el eit? eo ao | dt. (14) 
w ae 1\P. 


The k(w) determined by (14) is merely the projection of 
u(w)H(w)/G(w) on the space of realizable function, as 
shown by Doob.° Corresponding to k(w) the mean square 


error ¢€’” is 
0 oo ya 2 
Caaf | fee eee op | dt. 
Qa —=I00; 1} —o G,(p) 


The latter computation follows directly from (13) and 
(14). It follows also that the total mean-square-error e’ for 
the original problem is determined as 
2 1 i ie Cue, vi(p)H(p)u(o) d 


2 
= — = dt 
2m Ji Be G( ) ‘ | 


+f 6) juts) P| 


from (12) and (15). 

We return now to the factorization problem. In the 
first place, both & and G must be factorable if their 
product is to be factored. According to a theorem of 


(15) 


€ 


_ | H@) | re 


Glo) (16) 


5 A function f(t) is said to be in L, if il | f(t) pdt < @. 


6 This is a rephrasing of the filtering and prediction problem in 
terms of Hilbert spaces. Such a treatment permits more general 
results in a rigorous fashion. See Doob, op. cit., ch. 12. 
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Paley and Wiener,’ a nonnegative function in L, [for 
example, @(w)| is factorable if and only if 


i ©: log &(w) 


‘een > = eh, 


(17) 
The condition (18) is also necessary and sufficient to 
‘insure that the entire future of x(é) is not precisely pre- 
dictable from its past.* Therefore, it can be assumed that 
(17) is true because it makes no sense to predict or filter 
deterministic functions. 

As for G(w), assume that H(w, y) is the Fourier transform 
of h(t, y) in Ly, where h(t, y) = 0 when t < 0 for all y. 
Then H(w) is the transform of a function which is zero 
for negative ¢, and therefore 


ie log | H() |" d 


» Lltw eee et 


(18) 


| by the above-mentioned theorem of Paley and Wiener. 
But by the Schwarz inequality, G@) > | H(w) \’, so that 
(18) implies 


NS log Ge) 


OF i 0 > = © (19) 


This is exactly the condition (17) applied to G(w). Then 
'G(w) is factorable. 

| It should be noted that G(w) is not rational even if 
H(w, y) is rational, unless y can assume only a finite 
number of values. This makes the factorization more 
difficult, but factors can always be constructed’ if h(é, ) 
“meets the requirements of the preceding paragraph. 


III. FirrerRInG AND PREDICTION OF A 
SiaNAL with NoIseE 


Fig. 3 illustrates an extension of the preceding analysis. 
A noise input has been added, and the noise passed 
through a random system having a transform L(w, p). 
The signal input is passed through H(w, vy), as before. 

If the spectral density of the signal and noise are 
denoted by ®,(w) and ©,(w), respectively, the spectral 
density of the total input is 


Pw) = £,(4)G@) + #,(@)M@). (20) 


|Here M(w) is defined by M(w) = % | L(w, uw) |’. y and » 
“need not be independent; (20) only assumes that signal 
and noise are uncorrelated. 

If 1/(w) meets the same conditions as those previously 
imputed to G(w), it is possible to factor @(w) whenever 
signal and/or noise spectra satisfy (17). Suppose, then, 


| 7R. BE. A. C. Paley and N. Wiener, “Fourier Transforms in the 
Complex Domain,” American Mathematical Society Collogium 
Publication, vol. 19, pp. 17-20; 1934. 
8A. N. Kolmogorov, “Interpolation and extrapolation von 
stationiren zufalligen Folgen,’’ Bull. Acad. Sci. U.S.S.R., Ser. Math., 
vol. 5, pp. 5-14; 1941. 
9A method of accomplishing this construction is given by N. 
| Levinson, ‘‘A heuristic exposition of Wiener’s mathematical theory 
‘|of prediction and filtering,” J. Math. Phys., vol. 26, pp. 110-119; 
|| July, 1947. 
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Fig. 3—Generalized random parameter filter problem. 


that @(w) = x:()x,@). In that case, the optimum filter 
is found to be 


a5 a aa ee &,(p)H(p)u(p) ao | dt (21) 


2mx x:(p) 


k@) = 


by methods identical with those of the preceding section. 
Likewise, a mean-square-error computation yields. 


eeinne 


€ — 
Dae Nimes 


fe AOC Oy ee? 
we X1(w) 


_ 8e) | H@) [? 
bw) 


ii ®,(w) | u@) |’ E | ae. (22) 

As might be expected, a reduction to ordinary Wiener 
filtering is obtained if H(w, y) is not truly random. For 
instance, H(w, y) may be independent of y, or y may 
assume a specified value with probability 1. It follows 
that G(w) = | H(w) |’, since now averaging over y and 
squaring H(w, y) may be freely interchanged. The optimi- 
zation problem in question then is exactly equivalent to 
the following Wiener problem: let the filter input consist 
of a signal having spectral density ,(w) | H(w) |? and 
noise with spectral density ©,(w)M(w), and make the 
desired (ideal) operator on the signal equal to u(w)/H(w). 
Indeed, the Wiener filter for the problem is given by (21), 
and the least mean square error by (22), providing that 
the substitution G(w) = | H(w) |’ is made. 

The relationship between the filtering proposed here 
and Wiener filtering becomes even more apparent when 
no noise is present. This is the case treated in the preceding 
section. There, part of the numerator and denominator in 
the integrand of (15) are seen to cancel. In the mean- 
square-error expression (16), the second integral disappears 
entirely, since now | H(w) |"/G(w) = 1. In other words, a 
filter is designed for a signal spectrum for which the ideal 
operator is to be u(w)/H(w). 

A case of special interest is that of pure prediction or 
lag filters in the presence of noise. This means that 
u(w) = e’°*: a positive implies prediction, while a negative 
calls for a lagging filter. We now have 


Mo = aay, LI 


for the optimum filter. A simple computation yields 


i(tt+a)p ®,(p)H(p) a | dt (23) 
xi(p) 


ee ae gi faa ee 1s ®,@)H) Is 
ee | a4 age e' AES qo) dt 
#,(w) | Hw) | 


(24) 


4 ie #60) 1 e 4 as 


as the mean square error. The mean square error is thus a 
monotone nondecreasing function of a. If the prediction 
interval is infinite, the optimum filter has zero for its 
transfer function, and the mean square error is given by 
the input signal power. When an infinite lag is permitted,”° 
we have kw) = ©®,(w)H(w) /x,(o), and the only error is 
due to the randomness of the system, v7z., 
me) Hs) ") 4, 


=f evel - 
IV. EXAMPLES 

The first example studied now is concerned with a 
communication link in which it is desired that the trans- 
mitted message x(t) be exactly reproduced at the receiver 
in real time. This means that a predicting filter is necessary 
at the receiver to obtain the best estimate of x(t) from the 
present and past of ~(¢ — y); y is the time delay between 
transmission and reception of the message. Such a delay 
is the result of propagation time (particularly important 
in acoustic devices), and perhaps lags occurring in the 
modulation and/or demodulation process. 

The future utilization of a mobile unit is not generally 
known at the time of its design, so that the designer may 
well wish to regard the delay y as a random variable whose 
probability distribution he can determine from empirical 
data gathered on similar apparatus. Furthermore, toler- 
ances and aging of electronic components will also affect 
y in a random manner. 

It is seen from the preceding discussion that the opti- 
mum filter must treat the signal x(t) after it has been 
subjected to the random delay operator H(w, y) = e °°”. 
Since noise is assumed absent, the results of Section II are 
applicable. In terms of the notation of that section, 
u(w) = 1 because the desired filter output is identical 
with the original signal. 

If the probability distribution function of y is given by 
F(y), H(w) may be evaluated as 


P(w) 


Pw) | H) | 


Bw) (25) 


He) = [7 dF) = o6) 26) 


where ¢(w) is, by definition, the characteristic function’ 

for the random variable y. That G(w) = 1 is seen from 

the fact that | H(w, y) | is unity for every w and y. 
Substitution in (14) yields the result 


aaa eb, (p)6(p) ao | dt 


10 An infinite lag becomes applicable whenever recorded data 
are to be reduced at a later date. 

4 A definition and some of the properties of characteristic func- 
tions are given by Doob, op. cit., pp. 37 ff. 


kw) = (27) 
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for the optimum filter. The mean square error for this 
filter is 


eos i 
Pie Nececs 


2 


dt 


[ evil) deo 


+f P11 -|¢@) "Jdo (28) 
from (16). 

An interesting variation of the above obtains the 
solution to the antiaircraft gun problem described in the 
Introduction. If the shell is fired at time ¢t, we must 
be able to predict the target position at time t + ¥, 7.é., 
the time at which the shell reaches the altitude of the 
target. In other words, a prediction filter is required to 
compensate as nearly as possible for the delay y inherent 
in the system. 

If it were possible to designate y in advance, the filter 
should be designed by Wiener’s method; however, the 
time of flight to the altitude of the target aircraft may 
vary from occasion to occasion, so that y is known only 
in a statistical sense. 

For the purposes of this illustration, a one-dimensional 
solution of the problem will be offered. The target position 
is denoted by zx, while v is taken to represent its velocity. 
The hell is fired at time ¢, and x(7) is known for all 7 < ¢. 
It is desired to make a least square estimate of x(t + y), 
y being a random variable with some probability distri- 
bution function F(y). 

It has sometimes been assumed that v is stationary with 
spectral density” 


Do) =o (29) 
The form (29) raises the difficulty that « (the integral 
of v) is nonstationary, whereas our theory applies only to 
stationary inputs. The stationarity problem is surmounted 
as follows. Since «(t) is already known when the shell is 
fired, only a prediction of a(f + y) — x(t) need be made. 
This difference is produced by subjecting the stationary 
process v to the operator e’”’—1/iw. To prove that this 
operator on v gives the desired quantity, consider the 
representation for v(t): 


1 ie ot) 
1) = Vas 30 
(t) \/ ere (w) ( ) 
Applying the operator in question then yields 
La eee 
Zale ( i ave) 
1 ‘a iwt ies 
= =a é [i e ar dV(e) 
TW —o JQ 
, 
=f t+ pdr=ett+y-2. GBD 


” This form of the spectral density is justified in H. M. James, 
N. B. Nichols, and R. 8. Phillips, “Theory of Servomechanisms,”’ 
M.I1.T. Rad. Lab. Ser., } McGraw-Hill Book Co., Inc., New York, 
N. Y., no. 25, pp. 300-304; 1947. 
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The antiaircraft problem now can be defined in terms 
of Fig. 1. The system input is now v(é), and H(w, y) = 1. 
Indeed, it is the ideal operator which now varies according 
to target altitude. In recognition of this fact, this operator 
now is called u(w, y). It follows from (31) that 


Twy i 


uw, 7) = (32) 


tw 
Because the input is velocity rather than position, the 
optimum filter k(w) must be thought of as having a velocity 
input. Should the input actually consist of the position, the 
filter need merely be multiplied by zw to retain its optimum 
‘properties. 
The filtering error is again the difference between ideal 
and actual output, that is, 


1 tis fe Eats 
/ Wn —0 tw 


Averaging the squared error over y as well as v gives the 
result 


s= [ara [a 


‘The procedure of minimizing over the class of realizable 
/ function is the same as before. We obtain 


| eit eee 1 oo) 
HO = See fe vale)” ip | is) 


| where ¢(w) is again the characteristic function of y. We 
| note that (35) can be simplified considerably by resolving 
) the inner integral through contour integration in the upper 
‘half plane.’* If y has moments higher than the first [say 
f% y'*° dF(y) exists for come 6 > 0] there is no pole at 
the origin, so that only the residue at z = 7 is considered.“ 
The final result is 


e(t, y) = — Kt) | dV(w). (33) 


2 


— k(w) (34) 


13 Since y must be greater than zero, ¢(z) is regular in the upper 
half plane. 

14The existence of these moments is a sufficient condition for 
(the existence of the integral (34), for then we have the expansion 
4¢(w) = 1 + iwm + 0(w1+5) in which m is the mean of y. Note, how- 
“ever, that the integral converges when y has the Couchy distribution 
for which not even the first moment exists. 
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ke) =1-9@=1-f ear). 6) 
0 

This result is entirely consistent with a Wiener prediction 

filter; should y = @ with probability 1, k(w) = 1 —e” 

from (36). 

The mean square error which corresponds to the 
optimum filter (86) is easily computable through use of 
(84) and (36). This error is 

c= - oor +f se 0- 166) [de 7) 
which also reduces to the Wiener result if y is certain 
rather than random. 

For a comparison between the optimum and any other 
filter &(w), the difference in mean square errors is expressed 
by 


-f ‘ (ws) | kw) — Rw) |? deo. (38) 


In particular, our filter outperforms the fixed prediction 
interval (Wiener) filter by”’ 


© © 5 iaw 2 
gan ae alg ete vi (w) [o(w) 52 ee dt (39) 
Qe = tw 

where the Wiener filter has been designed on the assump- 

tion that y = a. In the event that ¢(w) = 1/1 + &’, 

(39) specializes to 
roo) 2 

Ad =ale*— f 7 dF) (40) 

0 


which may be obtained by contour integration. Thus the 
Wiener filter is as effective (in the mean square sense) as 
our filter if and only if y = a with probability 1. 
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16 Eq. (39) is obtained from (38) by substituting the required 
computation for k and k, and then applying Parseval’s relation. 
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A’Criterion for the Diagonal Expan- 
sion of a Second-Order Probability 
Distribution in Orthogonal Poly- 
nomials* 


In a recent paper,! Barrett and Lampard 
introduced an expansion for second-order 
probability distributions which expresses 
such a distribution as a double series in- 
volving orthogonal polynomials associated 
with the corresponding first-order prob- 
ability distributions. Several interesting 
consequences were derived for the class, 
A, consisting of all second-order distribu- 
tions having a diagonal expansion of the 
form 


DP E15 Fa) = pileips(@o) 


Ds OnOm (23)O, (%2), (1) 


but it was stated that the authors had not 
found what general restrictions must be 
placed on p(a1; v) in order that.it may be- 
long to A. 

If the conditional moments, 


i] plas; Xo) ») an dx. 


P(x) 

and 
Dotson 
Se a lie 
P2(Xo) ; : 


are denoted by m,(a1) and m,(x2), re- 
spectively (k = 0, 1, 2, ...), then a char- 
acterization of the class A is given by the 
following theorem. 


Theorem: Assume that p(x; 22) can be re- 
presented as a double series in the associated 
polynomials. Then p(x; x2) belongs to <A if, 
and only if, the quantities m,(x1) and 
m (x2) are polynomials (in their respective 
variables) of degree less than or equal to 
k, for each positive integral value of k. 


Proof: If p(21; x2) has the form (1), then 


my (0) = 2d G0, (3) 
f 0G *) U5 o(Xo) dx, 


k 
SS ay, Oe (@) 


n=0 


| O,. (es)apa(e>) dxz 


* Received by the PGIT, January 30, 1958. 

1J. F. Barrett and D. G. Lampard, ‘‘An expan- 
sion for some second-order probability distributions 
and its application to noise problems,’’ IRE Trans. 
ON INFORMATION THEORY, vol. IT-1, pp. 10-15; 
March, 1955. (The notation of this reference is 
adopted in the present note.) 


since any orthogonal polynomial is orthog- 
onal to every polynomial of lower degree on 
the same interval and with respect to the 
same weighting function? The same argu- 
ment shows that m,(a2) is also a poly- 
nomial of maximum degree k. 

Conversely, assume m,(a1) and mz (a2) 
are each polynomials of degree less than or 
equal to & for k = 0, 1, 2, . Then for 
fixed n, 


ae a) = f u® eG) dis 


is a polynomial of degree < n, since it is a 


linear combination of mo(21), Mn(41). 
Similarly, 
(2) = p(x; L2) to) q) 
Win (a) == ; bn (a1) dx, 


Pox ra) 


is a polynomial of degree < m. 
In the expansion 


(X13 Lo) = pr(x1)po(ro) 
: >S. SS OP 9 e) Oe Ge) (2) 
m=0 n=0 


the coefficients Gnn are given by 


Qn = Pfoc.: Io) Ove (1) On” (5) Ax, date 


is it py a, (ay) 82 (ary) da, 


=). for mm > 7, (3) 
using the previously stated property of 
orthogonal polynomials. Also, 


Cian = f pales) (02) 0 (aa) dr 


=O for, meen. 


Thus, for arbitrary m and n, dun = 0 for m 
~ n; that is, the expansion is diagonal. 
Q.E.D. 


Example: To illustrate the use of this 
criterion, consider the familiar Gaussian 
process, where 


| -3i a(1 — p) 


2D. Jackson, ‘Fourier Series and Orthogonal 
Polynomials,” Math. Assoc. of America, p. 154; 
1941, 
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which, by inspection, is seen to be a poly- 
nomial in xz, of maximum degree k. By 
symmetry, mx(a) is also a polynomial of 
maximum degree k and by the theorem, 
p(a1; a) has a diagonal expansion in 
orthogonal polynomials. Note that the 
relevant polynomials do not have to be 
known in order to apply the criterion. In 
this case, for instance, the polynomials are 
the Hermite polynomials and carrying out 
the actual expansion is equivalent to a 
derivation of Mehler’s formula.! Since the 
results cited by Barrett and Lampard de- 
pend only on the diagonal property of the 
expansion and not on the explicit form of 
the associated polynomials, the above cri- 
terion is expedient in that it is phrased 
entirely in terms of the given distribution 
and does not require construction of the 
polynomials by the rather tedious Schmidt 
process. 

It may also be of interest to note that 
the orthogonal polynomials used in the 
Barrett-Lampard expansion!» are essen- 
tially equivalent to the moments of the two 
first-order distributions; that is, the 
moments may be derived from knowledge 
of the polynomials and conversely. The 
proof, which is straightforward, is left to 
the reader. 

Lastly, it seems natural to ask whether 
p(x%1; x2) can have a diagonal expansion in 
terms of orthogonal polynomials different 
from the polynomials of the Barrett- 
Lampard expansion. If the zeroth degree 
polynomials, ¢o(a1) and ¢o®(a2), of an 
alternative polynomial representation are 
each taken as unity, then the polynomials 
will be those of the Barrett-Lampard ex- 
pansion. The proof, which presents no 
difficulties, shows that the Barrett-Lampard 
expansion is unique in the sense that any 
diagonal expansion of p(21; x2) in orthogonal 
polynomials must be of this type. 

Joun L. Brown, JR. 
Ordnance Res. Lab. 
Pennsylvania State University 
University Park, Pa. 


3 J. L. Brown, Jr., ““On a cross-correlation pro- 
perty for stationary random processes,’’ IRE TRANS. 
ON INFORMATION THEORY, vol. IT-3, pp. 28-31 
March, 1957. (General properties of the nondiagonal 
expansion are stated.) 


‘On Signal Parameter Estimation* 


The purpose of this note is to point out a 
‘simple but interesting result which is ob- 
tained when the ideas of Statistical Decision 
Theory’ are applied to estimate the param- 
eters of signals represented by a linear 
expansion of orthogonal functions.2 

| Suppose a signal S(t) is represented by 
{an expansion of the form 


foo) 


dX a9;(0) (1) 


7=1 


S@ = 


and vector S = (a, as, -++ , aj -*:) where 
‘the ¢; are orthonormal on some interval 
jee completely spans the interval (O, 7’) 
and S(t) is assumed to exist only in (0,7). 
Let S(t) be imbedded in additive white, 
Gaussian noise N(t) whose spectral density 
is No watts/rad/second. The combination 
is the data, V(t): 


Vi) = S® + NW. 

A Bayes estimator, y(V), of S on the basis 
of the data V(t) is desired in the form of a 
set of estimates of its expansion coefficients. 
‘Two cost functions are considered: 


a) C(8,7) = ||S— If, 
ithe quadratic cost function; 
by CS, x) =A — AS — 9), 


the “simple” cost function. (A is a posi- 
itive constant and 6 is the Dirac delta 
function. ) 


The quadratic cost function a) provides 
cost which is equal to the squared error of 
he estimate. The simple cost function b) 
states that the cost of any incorrect esti- 
imate is A while that of a correct estimate 
is —o. These cost functions are discussed 
amore fully in Middleton and Van Meter.’ 
The optimum estimator for the quadratic 
cost function is from (4.8) of Middleton 
and Van Meter.! 


| i So(S)F(V | S) 
yi(V) = ae) 
[ o(S)F(V | S) ds 


v0 


o(S) is the a priori distribution of S in 
signal space @. Since S here is represented 
by (1), o(S) is written in terms of the 
probability densities of the individual 
parameters, a;. These may or may not be 
independent of each other. 

Let V(¢) also be represented by an expan- 
Ksion of the form 


* Received by the PGIT, June 4, 1958. 

1—D. Middleton and D. Van Meter, “Detection 
'end extraction of signals in noise fr om the point of 
‘wiew of statistical decision theory.”’ parts I and II, 
WJ. Soc. Indus. Appl. Math., vol. 3, p. 192; December, 
11955, and vol. 4, p. 86; June, 1956. 

2w. H. Huggins, ' ‘Signal theory,” IRE Trans. 
hon Crrcnir THEory, vol. CT-3, pp. 210-216; De- 
*eember, 1956. 


3 Middleton and Van Meter, op. cit., part II, 
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V0 = Doel) O<tST 0) 


ANG VeECtOMAV I=" (Oi Uo)mis en Ogpcie)) 
For white, Gaussian noise 
Nit) = Di ed,(d) 
7=1 
N = (GG, Sees Cae, Sah) 


where the c; are normally distributed with 
mean, u;, = 0 and variance, o;? = No. 

Eq. (2) is a vector relationship. For the 
kth component we have 


[ a.o(S)FW | S) day 
Oi 13) 
F(V | $) dS 


9) 


The covariance matrix, Ky, of the noise 
has terms defined by 


[Ky], = CC = Nobix (5a) 
and 
ear (5b) 
N ik Ne 


Then F(V | S), the likelihood function, can 
be written 


Coe. 


-exp Ba (6) 


We now assume that o(S) can be written in 
the form 
a(S) = oi(a)o2(a2) +++ o;(a;) ++ (7) 
This assumption is quite cautious and 
perhaps unrealistic in that all signal param- 
eters are assumed to be independent of each 
other—knowledge of one is of no help in 
obtaining information about the others. 
With actual signals there is apt to be con- 
siderable interdependence of the param- 
eters. In this sense the estimators that 
follow below assume the worst possible a 
priori distribution for S.4 


® 1 
ik Ob 2M Ca, VN, vee | - 


ee ie — Oa oe) 
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Yo*(V) is simply the conditional expecta- 
tion of S given the data V. 

If we also assume that the a; are uni- 
formly distributed, ox(a;) is a constant and 
(4) becomes 


lve (V) |i = * Ap eae 
exp E (b. — | 1 (9) 
bi Wh. = (10) 


That is, for a signal whose orthogonal com- 
ponents are statistically independent and 
uniformly distributed, the quadratic cost 
function yields a Bayes estimator for the 
kt» component of S$ 


rah = [| Vooma. ay 


The optimum estimator S for the simple 
cost function is [from (4.4a) of Middleton 
and Van Meter!] the maximum likelihood 
estimate defined by 


o(S)F(V | S) > (12) 


> «(SF |S). 
Again, with white, Gaussian background 
noise and o(S) defined as in (7) 


o(S)F(V | S) = a Fo 


| i (b; = 
ee So bv). Ws NG (13) 


from which the kt component of S$ is 
simply 


Sk = b= [ Vidace a (14) 


It is also true that the b; are Minimax esti- 
mators (Section 4.4 of Middleton and Van 
Meter!) of the ax. 

If the a; are independent and normally 
distributed with mean pa,;, variance oa;”, 
then (9) becomes 


No 2 


Lye(V) |. = 


(15) 
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4See also Middleton and Van Meter, op. cit., 
part II, p. 97. 
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Similarly (13) becomes, under the previous assumptions 


o(S)F(V | S) 
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Thus the bj are unbiased estimators for this 

a priori distribution of S. It is also to be 

noted that the b; are easy to obtain via 

electronic instrumentation.® If o(S) and No 

are known, the latter estimators are also 
easy to obtain. 

E. M. Guaser 

df, 1Bls JBN, dig. 

Radiation Lab. 

The Johns Hopkins Univ. 

Baltimore 2, Md. 


5 J. H. Park, Jr. and E. M. Glaser, “The ex- 
traction of waveform information by a delay line 
filter technique,” 1957 IRE WESCON Conven- 
TION REcoRD, vol. 1, pt. 2, p. 171. 


On Manasse, Price, and Lerner, 
“Loss of Signal Detectability in 
Band-Pass Limiters’’* 


Manasse, Price, and Lerner have recently 
demonstrated! the remarkable fact that it 
is possible to reduce the signal-detectability 


* Received by the PGIT, July 15, 1958. 

1R. Manasse, R. Price, and R. M. Lerner, ‘‘Loss 
of signal detectability in band-pass limiters,’’ IRE 
TRANS. ON INFORMATION THEORY, vol. IT-4, pp. 
34-38; March, 1958. 


PGIT News 


The Professional Group on Information 
Theory, in conjunction with the Profes- 
sional Group on Circuit Theory, is sponsor- 
ing an International Symposium on Circuit 
and Information Theory, to be held at the 
University of California at Los Angeles on 
June 16-18, 1959. The purpose of the Sym- 
posium will be to consider recent advances 
in Information Theory and Circuit Theory, 
and in particular to explore areas of interest 
common to the two disciplines. A partial 
list of topics which are tentatively planned 
for the technical program are: application 
of linear graph theory to communication 
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degradation factor of a band-pass limiter 
to unity by adding strong noise of low 
spectral intensity and sufficient bandwidth 
to its input. The purpose of this letter is to 
elucidate the phenomenon. 

The input to the band-pass limiter con- 
sists of three components, which we may 
call the “input signal,” the “original noise,”’ 
and the “added noise.’’ If the input con- 
sisted of the added noise alone, the output 
would be a broad, low spectrum of noise, 
putting negligible power into the band 
occupied by the original signal and noise. 
The effect of including the input signal and 
original noise is essentially to add these 
voltages, both attenuated equally but 
undistorted, to the output. Since the spec- 
tral intensity of the output due to the added 
noise can be made negligible compared to 
that due to the original noise within the 
band occupied by the latter by making the 
spectrum of the added noise sufficiently 
broad and low, it follows that the output 
signal-to-noise ratio in the band occupied 
by the original signal and the noise is the 
same as the original input signal-to-noise 
ratio; 7.e., there is no loss in detectability. 

Of course, the same result can be obtained 
much more simply and with less attenuation 
by just omitting the limiter and not adding 
extra noise to the input nor filtering it out 
of the output. Thus, the effect of adding 
the extra noise, limiting, and filtering is 


nets and circuits; switching circuits and 
coding; applications of matrix theory to 
circuit and information theory; specifica- 
tion and synthesis of matched filters; net- 
works with random parameters; and charac- 
terization and optimization of nonlinear 
filters. 

The organization of the Symposium will 
follow that of previous years. The Trans- 
actions of the Symposium will be published 
in advance, and adequate time will be 
allowed at the meeting for active participa- 
tion and discussion from the floor. Every 
effort will be made to establish the atmos- 


Decembe! 


just to attenuate the original input by) 
something like the original-input-to-added. 
noise ratio. To see that this is the case, w 
refer to Blachman,? where the band-pass 
limiter is the special case m = 1, n = 0. 

Although this paper does not so state 
its analysis applies to the case of a narrow: 
band signal as well as to the case of a sinu- 
soidal signal, the output signal being definec 
as the ensemble-average output. Eq. (3) 0: 
Blachman,? which represents the total out 
put power (signal plus noise), must be aver: 
aged over the distribution of the signa. 
amplitude A if A is not constant. Eq. (4, 
represents the amplitude of the signal out- 
put; its square must be averaged over the 
distribution of A to obtain the output sig- 
nal power if A is not constant. Thus, (5) 
gives the output signal-to-noise ratio if F 
is taken to be the ratio of the average input 
signal power to the average input noise 
power. [Incidentally, (5) of Blachman® has 
exactly the same form as (4) of Blachman’ 
for the case m = 1, n = 0.] 

For small input signal-to-noise ratios, (4) 
is proportional to A, and the phase of the 
output signal is always exactly equal ta 
the phase of the input signal. Hence, for 
small signal-to-noise ratios, the output sig- 
nal is an attenuated but undistorted version 
of the input signal. This conclusion can be 
applied to the case where strong noise is 
added to the input of a band-pass limiter 
by regarding the original input signal and 
noise as a new signal. Thus, this new signal 
(i.e., the original input signal and noise) 
appears undistorted though attenuated in 
the output. 

Netson M. BLacHMAN 
Office of Naval Res. 
London, England 

On leave from Electronic 
Defense Lab. 

Mountain View, Calif. 


2N. M. Blachman, ‘“‘The output signal-to-noise 
ratio of a power-law device,” J. Appl. Phys., vol. 24, 
pp. 783-785; June, 1953. 

3N. M. Blackman, ‘‘The demodulation of an 
F-M carrier and random noise by a limiter and dis- 
criminator,” J. Appl. Phys., vol. 20, pp. 38-47; 
January, 1949. 


phere of a forum, rather than that of a 
lecture hall. 

The deadline for submission of detailed 
750-word summaries (in triplicate) of papers 
for presentation at the Symposium is De- 
cember 22, 1958. In addition to papers 
which present new results, tutorial papers, 
especially those which expound advanced 
mathematical techniques of value to circuit 
and information theorists, will be con- 
sidered. 

All correspondence, including summaries, 
should be sent to: Dr. G. L. Turin, Hughes 
Research Laboratories, Culver City, Calif. 
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biography, please see page 60 of the March, 
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